Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Git Tip of the Week: Forking and Pulling vs Pushing

2011, git, gtotw

This week’s Git Tip of the Week is about the GitHub generation. You can subscribe to the feed if you want to receive new instalments automatically.

Distributed Version Control Systems have really taken off in the last few years, though they’ve been around for over a decade. Probably the biggest growth spurt happened because of the controversy that launched Git back in April 2005, providing a rock solid distributed version control system, modelled on a filesystem. In only a few months, Git began hosting the 2.6.12 Linux kernel source.

However, as popular as Git may have been then, it wasn’t until the birth of GitHub until Git really took off. Founded in February 2008, GitHub brought Git to a much wider audience and provided a free hosting site for public Git repositories (as well as commercial plans for private repositories.) It has been argued that GitHub is one of the reasons why Git has taken off faster than others, like Hg and Bzr.

What GitHub brought was a focus on a new model; instead of creating patches (as covered last time), GitHub encouraged universal forking of the repository. So, if you want to add a change to an existing repository, you can fork it (and create your own clone), make the changes, and then send a pull request.

There’s nothing significant about pull requests in the Git workflow as compared with other DVCS tools. Pushing and pulling are two key primitives in a DVCS workflow, after all. But what was novel about GitHub’s approach was the way that pull requests could be sent, as an out-of-band message to the upstream repository owner suggesting the idea.

Not only that, but the upstream owner would then get a notification and be able to view the request in situ, and with the diffs as appropriate (outside of a mail client, via the web interface). Subsequent advances, such as the ability to fork-to-fix-typos, meant that anyone could suggest changes via the web without even needing to compile the code locally.

Pushing, Pulling, Patching or Proposing

As a result, Git repositories can end up with different workflows depending on the type of project and hosting environment you are using. They are:

  • Pushing: You have access to directly write into the repository, so you just push your changes
  • Pulling: Someone has changes locally and asks you to pull the change from their repository
  • Patching: You send the diffs/patches by a transport mechanism (bugzilla, email) for consideration
  • Proposing: You use a tool like Gerrit to propose changes to which can then subsequently be merged

Each project has potentially a different style of operation, and there isn’t a “right” way to use a Git repository. GitHub, for example, strongly favours the Pulling model when consuming changes from others (though of course, the repository owner can do pushing directly). The Linux Kernel, both for historic reasons and also for transparency and open discussions, chooses the patching (by e-mail) model.

The final one – proposing – is a combination of both the pull, push and patch models. They’re similar to GitHub’s pull mechanism, in that the project’s owners can see a list of all incoming changes and decide which ones to use; but the push-based upload means that the original repository doesn’t have to be forked on the remote server. And finally, tools like Gerrit (which I’ve mentioned before) can be used to generate patches, host in-situ discussions, and even act as a Git repository for consuming by standard git fetch protocols.

GitHub’s pull-based approach has certainly had a wide impact on the number of users willing to try that method. They have a note on collaborative development models on the subject:

  1. The Fork + Pull Model lets anyone fork an existing repository and push changes to their personal fork. This model reduces the amount of friction for new contributors and is popular with open source projects because it allows people to work independently without upfront coordination.

  2. The Shared Repository Model is more prevalent with small teams and organizations collaborating on private projects. Everyone is granted push access to a single shared repository and topic branches are used to isolate changes.

Certainly, if there are minor changes (like a typo in documentation) the fork-and-pull model, when combined with a web-based interface, can make things dramatically easier for contributors. Instead of having to need to create accounts on bug tracking systems (or tools like Gerrit), the repository can be forked, fixed, and a pull request fired off to the repository maintainers. With the merge button in GitHub, it can often be the case of allowing the fix to be merged in without the maintainer having to check the code out at all, if it’s sufficiently simple. Reducing the barrier to accepting changes helps keep an active open-source project alive and open to all.

The only problem with the Fork + Pull model is being able to attribute changes by user. For example, some open-source foundations want to ensure that any changes are granted against an existing open source license (Apache or Eclipse, for example). Other projects tend not to be as strict and will happily accept contributions from anyone, with the assumption that any contributors have agreed to the license. One additional service that patches-to-bugzilla or gerrit push provide is in the acceptance of a contributor agreement, which normally states that the individual is entitled to grant the code under the specific licence. One of the side-effects of creating an account often implies (explicitly or implicitly) the agreement to follow that foundation’s licensing rules.

So, there’s no “right” way to do Git; different teams, foundations and projects will have their own preference for working with a particular strategy, and may evolve over time. Instead, it’s useful to know what’s available so that the right choice for that project can be made, understanding the different flows available.

Come back next week for another instalment in the Git Tip of the Week series.