However, all good things must come to an end, and writing a weekly post, whilst keeping it fresh, is a non-trivial task. In addition, finding something new (or even vaguely interesting) to write about is increasingly difficult once you’ve covered the standard cases and a number of esoteric ones.
So, for my final post in the series, rather than writing about something new, I thought I’d link back to the ones that I’ve written before, in order. Thus, if you want to share the series with others, you can refer back to this index page as a means of finding out what I wrote when. The search list is all well and good, but it only shows the top 20 most recent posts. In addition, I’ve added some of my other Git related articles that may be of interest, even though they weren’t part of the Git Tip of the Week series.
I’d like to thank you for your time and interest in reading this series, and wish you a happy Christmas and a prosperous New Year.
]]>Distributed Version Control Systems have really taken off in the last few years, though they’ve been around for over a decade. Probably the biggest growth spurt happened because of the controversy that launched Git back in April 2005, providing a rock solid distributed version control system, modelled on a filesystem. In only a few months, Git began hosting the 2.6.12 Linux kernel source.
However, as popular as Git may have been then, it wasn’t until the birth of GitHub until Git really took off. Founded in February 2008, GitHub brought Git to a much wider audience and provided a free hosting site for public Git repositories (as well as commercial plans for private repositories.) It has been argued that GitHub is one of the reasons why Git has taken off faster than others, like Hg and Bzr.
What GitHub brought was a focus on a new model; instead of creating patches (as covered last time), GitHub encouraged universal forking of the repository. So, if you want to add a change to an existing repository, you can fork it (and create your own clone), make the changes, and then send a pull request.
There’s nothing significant about pull requests in the Git workflow as compared with other DVCS tools. Pushing and pulling are two key primitives in a DVCS workflow, after all. But what was novel about GitHub’s approach was the way that pull requests could be sent, as an out-of-band message to the upstream repository owner suggesting the idea.
Not only that, but the upstream owner would then get a notification and be able to view the request in situ, and with the diffs as appropriate (outside of a mail client, via the web interface). Subsequent advances, such as the ability to fork-to-fix-typos, meant that anyone could suggest changes via the web without even needing to compile the code locally.
As a result, Git repositories can end up with different workflows depending on the type of project and hosting environment you are using. They are:
Each project has potentially a different style of operation, and there isn’t a “right” way to use a Git repository. GitHub, for example, strongly favours the Pulling model when consuming changes from others (though of course, the repository owner can do pushing directly). The Linux Kernel, both for historic reasons and also for transparency and open discussions, chooses the patching (by e-mail) model.
The final one – proposing – is a combination of both the pull, push
and patch models. They’re similar to GitHub’s pull mechanism, in that the
project’s owners can see a list of all incoming changes and decide which ones
to use; but the push-based upload means that the original repository doesn’t
have to be forked on the remote server. And finally, tools like Gerrit (which
I’ve mentioned before) can be used to generate
patches, host in-situ discussions, and even act as a Git repository for
consuming by standard git fetch
protocols.
GitHub’s pull-based approach has certainly had a wide impact on the number of users willing to try that method. They have a note on collaborative development models on the subject:
The Fork + Pull Model lets anyone fork an existing repository and push changes to their personal fork. This model reduces the amount of friction for new contributors and is popular with open source projects because it allows people to work independently without upfront coordination.
The Shared Repository Model is more prevalent with small teams and organizations collaborating on private projects. Everyone is granted push access to a single shared repository and topic branches are used to isolate changes.
Certainly, if there are minor changes (like a typo in documentation) the fork-and-pull model, when combined with a web-based interface, can make things dramatically easier for contributors. Instead of having to need to create accounts on bug tracking systems (or tools like Gerrit), the repository can be forked, fixed, and a pull request fired off to the repository maintainers. With the merge button in GitHub, it can often be the case of allowing the fix to be merged in without the maintainer having to check the code out at all, if it’s sufficiently simple. Reducing the barrier to accepting changes helps keep an active open-source project alive and open to all.
The only problem with the Fork + Pull model is being able to attribute changes by user. For example, some open-source foundations want to ensure that any changes are granted against an existing open source license (Apache or Eclipse, for example). Other projects tend not to be as strict and will happily accept contributions from anyone, with the assumption that any contributors have agreed to the license. One additional service that patches-to-bugzilla or gerrit push provide is in the acceptance of a contributor agreement, which normally states that the individual is entitled to grant the code under the specific licence. One of the side-effects of creating an account often implies (explicitly or implicitly) the agreement to follow that foundation’s licensing rules.
So, there’s no “right” way to do Git; different teams, foundations and projects will have their own preference for working with a particular strategy, and may evolve over time. Instead, it’s useful to know what’s available so that the right choice for that project can be made, understanding the different flows available.
Come back next week for another instalment in the Git Tip of the Week series.
]]>One of the main benefits of a distributed version control system is that code changes can be pushed from one repository to another clone, and all dependent changes are pushed as well. However, that only works when you have write access to the remote repository, which in many cases you do not. One way of getting changes is by providing a patch, or a set of changes which can be applied to a remote repository at the other end.
Git started life as a distributed version control system for the Linux project, which actively uses mail lists both as a discussion mechanism and also as a distribution mechanism for patches (changes) for an existing codebase. (New features are just a special case of patching nothing to add the new code.)
To speed the processing of patches by mail, git developed tight integration with both (command-line) mail clients and of the generic Unix mbox
format. Patches can be generated in the form of mail messages, and the remote end can process them with a specific command to reconstitute the changes in the git repository.
Whilst the majority of projects don't use patches by mail as a change distribution mechanism, it is useful on occasion where either a patch needs to be generated and attached to a bug tracking system, or where changes need to be sent to a remote developer who doesn't have direct access (such as through a firewall).
The convention adopted by the git developers is to format one patch per e-mail message. The subject of the message then has the first line of the git commit, prefixed with a prefix that can be overridden on the command line but which defaults to [PATCH x/y]
as a means of threading them together. (Amongst other reasons, this is why the initial line of a Git commit message is suggested to be relatively short, so that it fits with a mail client's view of the subject and suggested prefix.)
Generating and sending patches
How do we generate these patches? The git format-patch
will generate a patch-file-per-commit in the range required, formatted ready to go as mail messages in mbox
format. The --to
can be specified for which mail address the patches should be sent to – but the sending is done separately.
(master) $ git format-patch --to cdt-dev@eclipse.org HEAD~..HEAD 0001-bug-333001-Description-Scanner-Info-doesn-t-release.patch From 9c9c692df50e5a9eb91b41cc86f57212afd78ef9 Mon Sep 17 00:00:00 2001 From: Andrew Gvozdev … Date: Sat, 16 Jul 2011 15:16:21 -0400 Subject: [PATCH] bug 333001: Description Scanner Info doesn't release ICProjectDescription To: cdt-dev@eclipse.org --- .../cdt/internal/core/model/CModelManager.java | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) ⋮
If the commit message has more detail than a single line, the detail will be included below the mail's subject headers. It's possible to add additional commentary below the commit message, before the patch is shown, and any text up until a combination of --- or >8
or 8<
(AKA 'scissor lines') is ignored by the patch application at the other end.
These patch files can then be transmitted via mail using the git send-email
command. This connects to the given SMTP server (either the one from your global ~/.gitconfig
or the project's .gitconfig
, or the one specified on the command line) and then sends each patch file as a separate e-mail:
(master) $ git send-email --smtp-server=smtp.gmail.com *.patch
As well as using format-patch
in a separate stage, it's possible to use send-email
to generate the patches and then send them immediately. (You can also configure send-mail
to prompt to open an editor so that you can customise the messages before they are sent.)
Applying patches
Once the patches have been created, how do you apply them into a local clone? If you have a patch file, you can apply it with git apply
:
(master) $ git apply 0001-bug-333001-Description-Scanner-Info-doesn-t-release.patch
Note, however, that this approach does not recreate the state of the world as it was on the sender's repository. Instead, the patch is applied but it only makes local changes to the repository's content instead; it does not recreate the commit (and more specifically, the hash of that commit). You can specify git apply --index
and git apply --cached
to get the changes put into the staging area, but this does not recreate the same commit as before.
To recreate the commit as it was exactly requires the use of git am
, which stands for apply mailbox. This runs through a mailbox (which may have one or more patches in it) and recreates a commit for each one of those patches.
Fortunately, the output generated by git patch
is already in mbox
format; it's the purpose of the otherwise dummy From 9c9c692df50e5a9eb91b41cc86f57212afd78ef9 Mon Sep 17 00:00:00 2001
line at the top of the patch file. As a result, the patches can be treated as one message per mbox
, and then applied in batch to the changes which get sent.
In fact, since mbox
elements can be concatenated together, this permits patch files to be concatenated together to form a larger patch file, which can be sent as a single unit via another transfer mechanism and then applied on the remote side.
Bundles
Patches provide a way of reconstituting a repository over a not directly connected mechanism, but the purpose of patches are to enable humans to investigate the set of changes as much as getting the change there. If however the desire is to move commits from one machine to another without direct connectivity, a better alternative is to use git bundle
.
(master) $ git bundle create changes.bundle HEAD~..HEAD Counting objects: 23, done. Delta compression using up to 8 threads. Compressing objects: 100% (8/8), done. Writing objects: 100% (12/12), 935 bytes, done. Total 12 (delta 5), reused 6 (delta 0)
The format of the bundle
uses the same format as the network transmission that Git uses over the network when cloning. As a result, the references contained are only those listed in the reference list.
Typically, a tag will be used to mark where the last known point was for the remote source; then, the difference between HEAD and that tag is used to build up the bundle for the remote end. Alternatively, branches can be used to simulate the branch on the remote end.
Once the bundle file has been generated, it can be sent over any transport to the remote host for reconstitution. This might involve burning to a CD, via a USB stick or some other network protocol.
On the client side, the client can run git verify
to determine if all required parent commits are present in the local repository. This must be run from the client git repository that you want to fetch into.
The client views the bundle as a remote that it can pull from, much like a path to a directory can be used to pull from a local file-based repository. You can add it as a remote (e.g. git remote add changes /tmp/changes.bundle
) or you can fetch from the path to the bundle itself:
(master) $ git verify /tmp/changes.bundle The bundle contains 1 ref b707c559636bf8e6dffb3145bd44b03de18868b3 HEAD The bundle requires these 1 ref 3580c1087c2860fbe6ca4c1a7a6d6e1eb1669aa3 Bug 333599 - [C++0x] Initializer lists & return without type /tmp/changes.bundle is okay (master) git fetch /tmp/changes.bundle From /tmp/foo.bundle * branch HEAD -> FETCH_HEAD
Once the references have been fetched into the repository (which can be referred to as FETCH_HEAD
) you can then inspect the changes, fetch/merge them into the local branches or reset your master
branch to that of FETCH_HEAD
.
Summary
It's not always possible to have write access to the repository you want to send changes to. In those cases you can send changes out of band, either via mail (if you want human reviews) or as a bundle (if you just want to send the commits).
Come back next week for another instalment in the Git Tip of the Week series.
]]>