Paul Webster recently wrote Where the git did that go? in relation to the incredible disappearing commits. Some said that a version control system shouldn’t be able to do this; but it’s actually all part of Git’s functionality. Let’s look at what happened.
A git repository stores commits in a transitive closure (a real closure, not a lambda) from the reachable items through to every commit (and every tree and blob). It’s not possible to remove a commit – and therefore the trees and blobs that make up that repository.
So, how is it possible to lose data with Git? Well, if you are using a standard Git repository, you can create branches with
git branch; and delete them with
git branch -d. When you delete a branch, you remove the pointer to the last commit – but you don’t actually lose the commits.
In addition, the
git reflog, which we covered previously, stores a list of the previous branch pointers. In other words, even if you delete a branch, the reflog has got your back.
Generally speaking however, only repositories with working directories have reflogs; bare repositories tend not to. There is a config option,
git config core.logAllRefUpdates, which can be used to force it on all repositories – or disable it completely if it’s not needed.
Even without a reflog, the commits aren’t removed immediately. If you run a
git gc, which repacks the repository into a more efficient structure, it will export non-referenced commits as loose objects. (You have to ensure that there aren’t any branches or tags or reflogs to see this behaviour; if there’s an existing pointer then it will not evict the object from the packfile.)
git fsck will check that all objects are present as expected. You can also see what is no longer referenced; running
git fsck --unreachable will show you which commits are no longer reachable due to deleted branches or removed tags. Running
git fsck --unreachable daily and mailing reports will give a good early warning of commits about to disappear if it’s a concern.
Objects which are no longer referenced can be evicted with
git prune; though this is a low-level operation which is often called from
git gc. By default it will not remove commits newer than 2 weeks old, and of course the commits that are reachable from that; so provided the branch (or tag) deleted has recent commits, it will stay around in the git repository for up two a fortnight afterwards.
Avoiding future issuesBoth branches and tags can be deleted; and when invoking a remote push operation a missing branch (or tag) on the client side can invoke a delete; for example:
git push github :refs/heads/masterwill delete the ‘master’ branch off the remote repository known as github. If this is in a script, such as
git push github $COMIT:refs/heads/masterand the variable is misspelled (therefore evaluates to the empty string) this can inadvertently delete the branch. (The same is true for tags in
A remote repository can disable such operations with the setting
receive.denyDeletes to prevent any ref deletion, and avoiding non-fast-forward branches with the
receive.denyNonFastforwards. If either of these are set, then deletes have no operation and pushes cannot overwrite code which doesn’t strictly follow it in history. (This is occasionally a useful operation; it may be necessary to provide a means to elevate this in certain situations if necessary.)
In addition, ensuring that branches have
core.logAllRefUpdates will ensure that the repository still keeps the history of the branches, at least for
Whilst git can be used, there are powerful options which can tweak or constrain its behaviour. In the face of scripts which have full access to the remote repository, it is advisable to have a more controlled set of options rather than the default you-can-do-anything approach. With this knowledge in mind, you should be able to set your options appropriately for your environment.
Come back next week for another instalment in the Git Tip of the Week series.