Thursday, November 20, 2014

Git Filter-Branch Saves The Day Again

Disclaimer

You can really mess things up, make sure you back up your files/repositories before you wield this axe.

Removing a large file from a Git repository

Recently I committed and pushed a commit to my remote repository. The push took an unusually long time. Looking at the commit, I discovered that I had committed/pushed a vagrant box into the repository. Doh! Now the repository was ~500MB. Here is what I did to clean up the repository and remove the large file.

I can't remember where I found this command (I may have assembled it from many locations). It is supposed to remove all references and the file from the repository.
git filter-branch --prune-empty -d /dev/shm/scratch --index-filter "git rm --cached -f --ignore-unmatch ubuntu-precise32-intpon.box" --tag-name-filter cat -- --all
This seemed to clean up the git tree, but didn't actually remove the file from the repository. So, on we go...

I ran across an Atlassian page which detailed several new steps to remove a large file. I skipped the first 3 steps because the above command seemed to do the same thing.

The next step was to prune all of the reflog references from now on back.
git reflog expire --expire=now --all

Then repack the repository by running the garbage collector and pruning old objects.
git gc --prune=now

Finally, push all your changes back to the remote repository.
git push origin master --force

Looking at my repository, it was back at ~45MB. I can pretend it never happened and life is good again. As long as I don't tell anyone about it.

Changing the author information

If you are committing to a public repository, you may not want your private email address exposed to the world. GitHub's change author info page has an excellent script that can fix that issue if you accidentally commit with the wrong email address. In case the page changes or disappears, here is the script:
#!/bin/sh
 
git filter-branch --env-filter '
 
OLD_EMAIL="your-old-email@example.com"
CORRECT_NAME="Your Correct Name"
CORRECT_EMAIL="your-correct-email@example.com"
 
if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL" ]
then
export GIT_COMMITTER_NAME="$CORRECT_NAME"
export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL" ]
then
export GIT_AUTHOR_NAME="$CORRECT_NAME"
export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL"
fi
' --tag-name-filter cat -- --branches --tags
Once it completes, you need to push the changes to the remote repository.
git push --force --tags origin 'refs/heads/*'
Your email is now changed.

No comments: