Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Git Tip of the Week: Ignoring build output

Gtotw 2011 Git

As the third of a new series of Git Tips, I'm posting a git tip per week. If you don't subscribe to my blog then these are available as a separate feed which you can subscribe to.

Ignoring build output

For compiled applications, it's quite common for there to be a transient directory which should not be put under version control.

Firstly, the output should be repeatable by rebuilding, and secondly, Git (like other VCSs) doesn't compress non-textual data as well as it does textual (program source), since even a small change in the source can result in a large pertubation of the binary. Furthermore, unlike CVCS, you can't have a partial checkout of a DVCS and so you will bring down not only the current, but all previous history if you do.

To hide the build directory from version control, you can use the .gitignore file. This can be placed at any level in the repository, and reach any level below with wildcards. For example, if you have a multi-maven build process, you can do:

# This is a comment
# Ignore target at the top level
# Ignore child project's targets as well
# And any grandchildren ...
# Or you can do:
# which matches 'target' in any sub folder

You can also just copy the same .gitignore file across all target modules, or even reference them with a symbolic link to the same file. (Under the covers, a file with exactly the same contents will hash to the same value so the space saving is the same anyway, but it may be more convenient symlinking a single file so changes made anywhere are visible in all folders.)

The build folder depends on what system you're using; under Xcode, it's typically build, whilst on other Java projects it might be bin.

You can also use the ignore file to match any other file, such as temporary files (*~) or the OSX turdlets (.DS_Store and ._*).

Recovering from a mistake

It's quite possible that by the time you realise you have added the ignore entry, you already have some files in the repository committed.

If you've just committed, the easiest thing to do is to remove the files, and amend the previous commit.

$ git rm build
$ git commit --amend

This will re-write the tip of the history to remove the build, and you can pretend that nothing ever happened. But what if it's an older version or one that has come from a different developer?

Caution: the following will change history. This includes changing hashes which will affect those pushing/pulling in the future. It should only be done with an understanding that this may make things difficult subsequently; there is a trade off between fixing the repository and leaving it as is. However, if the committed built code contains sensitive information (such as passwords or other security tokens) then it may be necessary to do anyway. Remember, history is written by the winners!

You can fix up a tree with git filter-branch. This executes a command on each commit in a branch, and if successful, iterating through until the end. In the case where there's a general folder, like target, it can be easy to fix; but there may also be more difficult cases which are done on a file content basis. In this case, I'm assuming that we've just accidentally committed the target directory. We need a version range to operate on; the good point where we started, and the point where we are now. I'm going to assume they are badc0de and HEAD

$ git filter-branch --tree-filter "rm -rf build" badc0de...HEAD

This ensures that all commits between badc0de and HEAD (which is the current version of the branch) have rm -rf build executed and then re-committed. For commits which haven't changed, you'll end up with the same hash (and so this won't be a problem), but for commits which have changed content you'll end up with a new hash value. Subsequent commits will have a new hash value, whether their content changes or not, since the commit depends on both the contents and the parental pointer.

Other approaches

The .gitignore file is shared with other users via push/pull, and makes sense for general shared settings. However, there is a per-repository entry you can set in .git/info/exclude which applies to all files across the tree.For example, I could have alblue in my .git/info/exclude file, which would permit me to try local test scripts in a location I know would not be committed (or identified to the rest of the world).

The choice of where to put the .gitignore is somewhat arbitrary; you can have one at the root of the repository or one per sub-project (or deeper locations in the tree). If your build is structured in a way that allows one, or a few, build locations to be defined at the top level then by all means just have a single .gitignore at the top level. However, it is often easier to understand why folders/files aren't being added by looking in the .gitignore in the current directory; so unless you have a reason to do so, try and avoid multiple directory references in the ignore files.

Come back next week for another instalment in the Git Tip of the Week series