Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Git Tip of the Week: Patches by Email

2011, git, gtotw

This week’s Git Tip of the Week is about how git handles patches by email. You can subscribe to the feed if you want to receive new instalments automatically.


One of the main benefits of a distributed version control system is that code changes can be pushed from one repository to another clone, and all dependent changes are pushed as well. However, that only works when you have write access to the remote repository, which in many cases you do not. One way of getting changes is by providing a patch, or a set of changes which can be applied to a remote repository at the other end.

Git started life as a distributed version control system for the Linux project, which actively uses mail lists both as a discussion mechanism and also as a distribution mechanism for patches (changes) for an existing codebase. (New features are just a special case of patching nothing to add the new code.)

To speed the processing of patches by mail, git developed tight integration with both (command-line) mail clients and of the generic Unix mbox format. Patches can be generated in the form of mail messages, and the remote end can process them with a specific command to reconstitute the changes in the git repository.

Whilst the majority of projects don’t use patches by mail as a change distribution mechanism, it is useful on occasion where either a patch needs to be generated and attached to a bug tracking system, or where changes need to be sent to a remote developer who doesn’t have direct access (such as through a firewall).

The convention adopted by the git developers is to format one patch per e-mail message. The subject of the message then has the first line of the git commit, prefixed with a prefix that can be overridden on the command line but which defaults to [PATCH x/y] as a means of threading them together. (Amongst other reasons, this is why the initial line of a Git commit message is suggested to be relatively short, so that it fits with a mail client’s view of the subject and suggested prefix.)

Generating and sending patches

How do we generate these patches? The git format-patch will generate a patch-file-per-commit in the range required, formatted ready to go as mail messages in mbox format. The --to can be specified for which mail address the patches should be sent to – but the sending is done separately.


(master) $ git format-patch --to cdt-dev@eclipse.org HEAD~..HEAD
0001-bug-333001-Description-Scanner-Info-doesn-t-release.patch
From 9c9c692df50e5a9eb91b41cc86f57212afd78ef9 Mon Sep 17 00:00:00 2001
From: Andrew Gvozdev …
Date: Sat, 16 Jul 2011 15:16:21 -0400
Subject: [PATCH] bug 333001: Description Scanner Info doesn't release
 ICProjectDescription
To: cdt-dev@eclipse.org

---
 .../cdt/internal/core/model/CModelManager.java     |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)
⋮

If the commit message has more detail than a single line, the detail will be included below the mail’s subject headers. It’s possible to add additional commentary below the commit message, before the patch is shown, and any text up until a combination of — or >8 or 8< (AKA ‘scissor lines’) is ignored by the patch application at the other end.

These patch files can then be transmitted via mail using the git send-email command. This connects to the given SMTP server (either the one from your global ~/.gitconfig or the project’s .gitconfig, or the one specified on the command line) and then sends each patch file as a separate e-mail:


(master) $ git send-email --smtp-server=smtp.gmail.com *.patch

As well as using format-patch in a separate stage, it’s possible to use send-email to generate the patches and then send them immediately. (You can also configure send-mail to prompt to open an editor so that you can customise the messages before they are sent.)

Applying patches

Once the patches have been created, how do you apply them into a local clone? If you have a patch file, you can apply it with git apply:


(master) $ git apply 0001-bug-333001-Description-Scanner-Info-doesn-t-release.patch

Note, however, that this approach does not recreate the state of the world as it was on the sender’s repository. Instead, the patch is applied but it only makes local changes to the repository’s content instead; it does not recreate the commit (and more specifically, the hash of that commit). You can specify git apply --index and git apply --cached to get the changes put into the staging area, but this does not recreate the same commit as before.

To recreate the commit as it was exactly requires the use of git am, which stands for apply mailbox. This runs through a mailbox (which may have one or more patches in it) and recreates a commit for each one of those patches.

Fortunately, the output generated by git patch is already in mbox format; it’s the purpose of the otherwise dummy From 9c9c692df50e5a9eb91b41cc86f57212afd78ef9 Mon Sep 17 00:00:00 2001 line at the top of the patch file. As a result, the patches can be treated as one message per mbox, and then applied in batch to the changes which get sent.

In fact, since mbox elements can be concatenated together, this permits patch files to be concatenated together to form a larger patch file, which can be sent as a single unit via another transfer mechanism and then applied on the remote side.

Bundles

Patches provide a way of reconstituting a repository over a not directly connected mechanism, but the purpose of patches are to enable humans to investigate the set of changes as much as getting the change there. If however the desire is to move commits from one machine to another without direct connectivity, a better alternative is to use git bundle.


(master) $ git bundle create changes.bundle HEAD~..HEAD
Counting objects: 23, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (8/8), done.
Writing objects: 100% (12/12), 935 bytes, done.
Total 12 (delta 5), reused 6 (delta 0)

The format of the bundle uses the same format as the network transmission that Git uses over the network when cloning. As a result, the references contained are only those listed in the reference list.

Typically, a tag will be used to mark where the last known point was for the remote source; then, the difference between HEAD and that tag is used to build up the bundle for the remote end. Alternatively, branches can be used to simulate the branch on the remote end.

Once the bundle file has been generated, it can be sent over any transport to the remote host for reconstitution. This might involve burning to a CD, via a USB stick or some other network protocol.

On the client side, the client can run git verify to determine if all required parent commits are present in the local repository. This must be run from the client git repository that you want to fetch into.

The client views the bundle as a remote that it can pull from, much like a path to a directory can be used to pull from a local file-based repository. You can add it as a remote (e.g. git remote add changes /tmp/changes.bundle) or you can fetch from the path to the bundle itself:


(master) $ git verify /tmp/changes.bundle
The bundle contains 1 ref
b707c559636bf8e6dffb3145bd44b03de18868b3 HEAD
The bundle requires these 1 ref
3580c1087c2860fbe6ca4c1a7a6d6e1eb1669aa3 Bug 333599 - [C++0x] Initializer lists & return without type
/tmp/changes.bundle is okay
(master) git fetch /tmp/changes.bundle
From /tmp/foo.bundle
 * branch            HEAD       -> FETCH_HEAD

Once the references have been fetched into the repository (which can be referred to as FETCH_HEAD) you can then inspect the changes, fetch/merge them into the local branches or reset your master branch to that of FETCH_HEAD.

Summary

It’s not always possible to have write access to the repository you want to send changes to. In those cases you can send changes out of band, either via mail (if you want human reviews) or as a bundle (if you just want to send the commits).


Come back next week for another instalment in the Git Tip of the Week series.