Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Someday ...

2011 Jenkins Git Eclipse Gerrit

Someday, all software will be built this way.

I've been a fan of Git for a while now; I've written a few Git posts in the past including the explanatory Git for Eclipse users post, which explains the key differences between DCVS and CVCS.

I've also been using Gerrit for a while via the EGit review page at Eclipse. Gerrit is a code review system, based on the features that a DVCS gives you. If going Git is the main course, then Gerrit is the dessert; with Jenkins being the cherry on top.

Code review systems generally fall into one of two categories, which have advantages and disadvantages:

  • Pre-commit reviews, in which a diff attached to a review system
    • Advantage: avoids polluting the version control system with versions which may be inaccurate or may not make it
    • Disadvantage: committed code post review may not exactly match the proposed change
  • Post-commit reviews, in which the change is committed and then later blessed
    • Advantage: ensures that exactly that change is part of the version history
    • Disadvantage: potentially pollutes the version control system with changes which need amendment or may have to be rewritten or aborted

There are of course other approaches, such as the man-in-the-middle commit (such as used by Eclipse, where patches are uploaded as attachments to a bug review system and then committed, possibly with changes, by a separate committer). However, this approach tends to work with open-source systems; and it doesn't solve the problem of committers (with write access) having their changes reviewed. In fact, the patch-attach tends to go stale far quicker than review systems.

Branches are the way forward

So how does DVCS help here? Well, the key problem with review-after-commit is not so much that the change exists in the version control system, but rather most implementations use review-after-commit-to-HEAD(trunk). As such, one bad code commit causes subsequent code commits to be invalidated, or at least, contain code which can't be easily undone (other than by committing a reverse patch).

The solution, therefore, is to use branches. If you commit onto a branch, you don't affect any developers on HEAD(trunk). The branch can be reviewed independently, suggestions or improvements made, checked against a build system, and then finally merged onto HEAD(trunk) post-review. Of course, if you have two concurrent changes, you need two branches. And if you have a team of ten developers, you need 10(*2) branches. Go down this road for any sensible amount of time and you quickly realise that you need one branch per change, so that no changes interfere with another.

Of course branches bring about merges (and merge conflicts), so using a tool which is implicitly based around branches and merging is a no-brainer. So using Git (or Hg), you can develop changes on a local branch, push that branch to a central warehouse, ask others to review it, and then merge exactly that change onto master(HEAD, trunk). Even better, since it's a DVCS, that merge commit will have the full history of the change (including the sign-off) so someone can say "Alex approved 01b3cd" and you know exactly what change that refers to.

There are a couple of variations on this theme in the Git world (Hg users tend not to like re-writing history) which involve 'squashing' the branch (i.e. removing all the intermediary steps and replacing with a single unified diff of the branch) as well as 'rebasing' (moving the diff forward to master (HEAD, trunk) instead of creating a merge node (which joins together two otherwise unrelated Git trees). The different configurations here don't really affect the way that the review-after-commit-and-merge works; when you bless that code, you bless that code.

Enter Gerrit

Gerrit is a review-based tool which operates on a Git repository. (There's nothing significant that would prevent a tool like Gerrit working on Hg; but like GitHub, innovations tend to happen faster with Git.) The way Gerrit works is by being a process-in-the-middle between your local Git repository and the 'blessed' central Git repository. Once you use it, it's common for Gerrit to become the de-facto owner of the Git repository that it fronts; though since DVCSs don't have an enforced centralised Git repository as such, this can be changed if desired. It is common (in organisations replacing legacy version control systems such as CVS and SVN) to have a centralised server to host source data, which may be on higher-resiliency and backed-up hardware; so the central Gerrit instance can be an advantage for those looking to make the switch.

You configure Gerrit as a remote repository, much like you would with any other. In fact, you can use it as the only remote repository, by cloning from it initially. The EGit project, for example, is available via ssh://username@egit.eclipse.org:29418/egit.git, although it's faster to clone/pull from the unauthenticated http://egit.eclipse.org/egit.git, which is the same underlying on-disk data.

To push changes to Gerrit, you configure a remote based on the authenticated access. You also don't push to refs/head/master (which is the Git synonym for HEAD or trunk) as you might if it was a standalone Git repository; rather, you push to refs/for/master, or for refs/for/other if you want to submit a different branch. You can configure it with a wildcard, so any local branch can be pushed:

git config remote.review.url ssh://username@egit.eclipse.org:29418/egit.git
git config remote.review.push refs/heads/*:refs/for/*

The refs/for/master acts like a PUT request; there isn't a single branch with that name, but rather, each push to refs/for/master results in the creation of its own unique branch. In the case of EGit change I1c5ec794, the temporary branch allocated was refs/changes/46/2446/1. Other changes have their own branch; EGit change Ie639e366 corresponds to branch refs/changes/47/2447/2. (The 2 at the end in this case indicates the second version of the change; though this is a Gerrit specific notation. The first two digits are merely a directory discriminator based on the last two numbers of the change, so it contains 47/2347/*, 47/2247/* and so on.)

Once the change is in the DVCS, it's possible to generate diffs or any other kind of processing with standard Git tools. Not only that, because it's on a publicly accessible remote DVCS server, you can even checkout that particular changeset. Gerrit even helpfully contains the command needed to do that in the web page:

git fetch http://egit.eclipse.org/r/p/egit refs/changes/47/2347/2 && git checkout FETCH_HEAD

This makes it possible to bring a proposed change down, run tests or any other kind of processing on it. In fact, to make it really easy, the above change implements a “fetch from Gerrit” action in Eclipse, which permits you to check out a change and create a local branch from it.

By this point, you may well be thinking Gerrit sounds like a handy review tool. As well as storing changes, you can review the diffs, comment on files on a file-by-file or even line-by-line basis as well as on the review as a whole. But it also stores review flags, which can include the standard kind of +1 and -1 votes. By default, it needs each change to be reviewed to get a +2 review change vote, though this can be configured.

There's also a +1 and -1 “Verified” flag, which was introduced to support Android development (which uses Git and Gerrit). The purposes of Verified is to ensure that the code compiles and passes its test suite, rather than the code-review which is general sanity.

Enter Jenkins

Jenkins is a continuous integration server with a short history but a tumultuous past. As a continuous integration server, it can check out and execute builds, run tests and mail results. If you're used to continuous integration servers, you may have seen the ability to check for changes via the SCM and kick of builds automatically.

Jenkins is designed to permit arbitrary triggers to kick off builds. This can include timed triggers (nightly), by polling an SCM for changes, or by many other means. Normally, a triggered build will just kick off a build based on a specific branch (e.g. master).

The Gerrit Trigger allows you to kick off a build when a new review in Gerrit is posted. Not only that, but it will also check out exactly the proposed change, compile it, and run all the tests – automatically. When it finishes successfully, it can post a +1 Verified (or -1, if it fails). All of this can happen before the reviewer has even had time to see your change, so that if the change causes a build failure, it can be rejected automatically.

In addition, there's a Git Plugin that can be used to tag the build and push changes up to the remote repository, so there's a persistent record of the change having successfully been built.

It's interesting that Kim Moir has posted on the Eclipse build process as well today. It looks like Hudson will be a key part of that, and the Jenkins and Hudson plugins are compatible (for the time being, at least). But if Eclipse is going to move over to Git full-time, then Gerrit will effectively become mandatory, either on Eclipse hardware or managed by individuals on behalf of specific teams.

Submission

So once a change has been pushed to Gerrit, automatically built and tested, flagged as verified, and reviewed by a couple of other developers, the change is good to go. Since Gerrit has all the information, it can apply the change to the master branch on the user's behalf.

Merging the branch can take a number of different forms; cherry-pick allows you to write the change on top of master, merge will create a merge node, and fast-forward requires the patch be 'up-to-date' before being committed. Either way, the contents of the master version control branch always go through a review and test process, whilst it's possible to guarantee that the changes merged are exactly the ones approved.

Conclusion

Once you go Git, you don't go back. Once you go Gerrit, unreviewed code becomes unthinkable. And once you go Jenkins, you don't even need to compile and test the code yourself. Someday, all software will be built this way.