This week we’ll be taking a bit of a deeper dive into the way that Git stores its objects. We’ll look at how they’re identified, how they’re related, and see why Git handles moves better than other version control systems.
By now, you’re familiar with the concept of a commit hash (or just commit) – a 40-character hexadecimal sequence, which can uniquely identify a change log, such as
d16085b3b913e5bc5e351c0a7461051e9973629a. But where does this come from?
A git repository is actually just a collection of objects, each identified with their own hash. Whenever you add a file, you get a hash generated on its contents, and this hash is used to uniquely point to that version of a file. For example, if you create an empty file, it will have the hash
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391. You can confirm this by adding an empty file to a repository and using
git ls-tree to see the contents:
(master) $ touch empty (master) $ git add empty (master) $ git commit -a -m "Empty" [master (root-commit) 4145429] empty 0 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 empty (master) $ git ls-tree master . 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 empty
git ls-tree is saying is that the
master branch contains a file called
empty whose permissions are 100644 (owner read/write, group+other read), and whose hash is
Similarly, if you look in the repository’s object store, you’ll find that a file
.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 has been created. The directory is split such that there are 256 different top-level names (00-ff) and the name of the hash is concatenated with the parent’s directory.
So, how does Git compute this value? Well, it uses SHA1 hash; but the SHA1 of an empty input isn’t this value. In fact, Git prefixes the object with
"blob ", followed by the length (as a human-readable integer), followed by a NUL character, followed by the contents. So for our case, we have:
$ echo -en "blob 0\0" | shasum $ echo -en "blob 0\0" | openssl dgst -sha1 $ printf "blob 0\0" | shasum $ printf "blob 0\0" | openssl dgst -sha1
All of these print out the same value,
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391. Note that the \0 is the escape code for the NUL character; the
-e to echo stipulates it should obey the escape. If you get
fef5d… then it is interpreting the \0 as two characters, the \ and the 0. (And if you get
b825… then it’s adding newline at the end.)
Instead of calculating this format ourselves, we can use
git hash-object to calculate a hash – or, with
-w, insert an object in our local repository:
(master) $ echo 'Hello, World!' | git hash-object -w --stdin 8ab686eafeb1f44702738c8b0f24f2567c36da6d (master) $ ls .git/objects/8a b686eafeb1f44702738c8b0f24f2567c36da6d (master) $ echo -e 'blob 14\0Hello, World!' | shasum 8ab686eafeb1f44702738c8b0f24f2567c36da6d
This has created a hash of our object (
blob 14\0Hello World!\n) and written it into the objects directory under the same name. The contents are compressed with the DEFLATE algorithm; but at the moment, it’s not used or referred to anywhere in our tree. Although we don’t see it in the working directory, we can see it in the repository itself:
(master) $ git show 8ab686eafeb1f44702738c8b0f24f2567c36da6d Hello, World!
Next time, we’ll look at how Git organises objects into directories, and ultimately, commits.
Come back next week for another instalment in the Git Tip of the Week series.