Thursday, June 30, 2011

Google Plus

References

Now that Plus has launched, it's interesting to compare to what I wrote about my predictions for what circles could be in the light of (at the time) Twitter's only mistake – the dreaded dickbar. At the time, I wrote:

So, what could Google innovate on? Well, to be a success (over and above Buzz), it has to have:

  • An open API, probably backed with OpenID/OAuth/OAuth2 to permit additional clients being developed
  • A way of *not* having any and all messages delivered to your e-mail, especially when it's from yourself
  • Have a way of associating groups of people and defining groups by role, e.g. “public”, “friends”, “work colleagues”, “family” and have a way of switching visibility on a per-message basis
  • A way of uploading video/pictures along with text messages, probably from portable mobile devices with cameras

On top of all this, Google almost certainly needs to have a couple of native clients, not just a web client, to demonstrate that anyone can join in.

So far, it's early days to expect there to be a public API so the first point is still out for the jury. The second point has been a facepalm episode – every time someone comments on your post, you get an e-mail notifying you that something has happened. It's like Douglas Adam's button on Disaster Area's stunt ship, which when you press it lights up a light saying “Do not press this button again.” Fortunately, theres various settings that you can change, including email delivery, which you can adjust to your liking afterwards (read: none).

On the plus side, they do appear to have created the circles aspect I alluded to, with friends and family (although they call friends ‘acquaintances’, which might be a sensible rewording of the term). And it looks like it may be integrated with mobile devices to the extent where picture uploads will become possible – there's an Android client already, with an iOS client in the works.

Limited invites

One of the things that limited invites does is increase scarceness, which in turn, pushes up the value of something. At the moment, Google+ is new, and people are wanting to get on board.

But the power of a social network is for it to grow at its own speed. Indeed, the more people are in your social network, the more likely you are to use it (and conversely, the less people you know, the less you are likely to use it). Wave died a death because people though “What is this for?” when there was no-one to share it with. And with waves of people (wanting to) join Google+, but no-one there, it means you end up with a much more restricted set of people to talk with. As Martin Gratzer said on twitter: “Ok, this is how Google+ looks like. First impression, nice look & feel. Facebook clone with Buzz integration, video chat minus 800 friends.”

This may just be growing pains. According to Google, they opened then closed invites (source). But it's likely the early adopters who are interested in joining in the first place; similarly, those early adopters are often the most influential (technically) with their friends and often have connected networks in the first place. By creating artificial scarceness, the net effect may be to limit the number of fully connected networks that grow in the early stages, leaving the early adopters to walk away and thus impact the growth.

One thing's for sure. Both Buzz and Wave failed in the adoption stakes. By addressing the privacy issues of Facebook, coupled with the ease of use for Twitter and native clients, Google+ may be a success hit – even if you can't search for it by name. And, if you want to follow me I'm alblue on Google+.

Thursday, June 23, 2011

Git Tip of the Week: Pulling and Rebasing

References

This week's Git Tip of the Week is about configuring what happens when you pull. You can subscribe to the feed if you want to receive new instalments automatically.


Pulling

Back in March, I wrote about pushing and pulling as an introduction to getting data from a remote Git server. Now that we've talked about rebasing (twice), we can talk about the different pull strategies.

Recall that git fetch merely makes the changes available in your local repository; it doesn't affect the branch(es) that you are on. On the other hand, git pull does affect the branches that you are on as it tries to include changes from upstream.

By default, Git will attempt to do a merge whenever you pull changes. Here's what it looks like when you do a merge:


$ git log --oneline master
e1c1744 Third
36c3a20 Second
c66b73b First
# Take a branch from a previous point in history
$ git checkout -b feature-merge 36c3a20
Switched to a new branch 'feature-merge'
# Set it up so I can pull from master
$ git branch --set-upstream feature-merge master
Branch feature-merge set up to track local branch master.
$ git log --oneline feature-merge
36c3a20 Second
c66b73b First
# Add another file so we diverge
$ touch feature-merge.txt
$ git add feature-merge.txt
$ git commit -m "Feature-Merge"
[feature-merge a16a3db] Feature-Merge
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 feature-merge.txt
# Now, let's see what happens when we pull
$ git pull
From .
 * branch            master     -> FETCH_HEAD
Merge made by recursive.
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 c
$ git log --oneline --graph
*   23b50a0 Merge branch 'master' into feature-merge
|\  
| * e1c1744 Third
* | a16a3db Feature-Merge
|/  
* 36c3a20 Second
* c66b73b First

Here, we've invoked the default operation which is to do a merge. When we're behind master with local changes, and we do a pull, it automatically sets up a merge node, as shown by the graph above. (Normally the branch would be a remote one; however, I'm showing a tracked local branch for convenience.)

However, this is configurable to do a rebase instead. This configuration is done on a branch-by-branch basis. Let's create a new branch to experiment with this feature:


$ git checkout -b feature-rebase 36c3a20
Switched to a new branch 'feature-rebase'
$ git branch --set-upstream feature-rebase master
Branch feature-rebase set up to track local branch master.
$ git log --oneline feature-rebase
36c3a20 Second
c66b73b First
# Add another file so we diverge
$ touch feature-rebase.txt
$ git add feature-rebase.txt
$ git commit -m "Feature-Rebase"
[feature-rebase 62855ee] Feature-Rebase
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 feature-rebase.txt
$ git log --oneline
62855ee Feature-Rebase
36c3a20 Second
c66b73b First
# Configure the branch for rebase operations
$ git config branch.feature-rebase.rebase true
# Now, let's see what happens when we pull
$ git pull
From .
 * branch            master     -> FETCH_HEAD
First, rewinding head to replay your work on top of it...
Applying: Feature-Rebase
$ git log --oneline
* ff0ecc2 Feature-Rebase
* e1c1744 Third
* 36c3a20 Second
* c66b73b First

What's happened here is that instead of creating a merge node, the pull operation resulted in a rebase of the underlying branch against the current master. This is dangerous (in the sense that a rebase is dangerous and changes history) but provided that you're doing this on local branches (those not pushed to the repository yet) then you may find this acceptable.

This configuration has to be done on a branch-by-branch basis. If your preferred way of working is to always enable this option, then there is a configuration item which can help:


$ git config branch.autosetuprebase always

If this is configured (either in a repository or globally with the --global setting) then whenever you create a new branch, it will automatically add branch.name.rebase=true for you.


Come back next week for another instalment in the Git Tip of the Week series.

Wednesday, June 22, 2011

Eclipse Indigo Released

References

Today marks the release of Eclipse Indigo, the combined set of 62 projects and 46 million lines of code. I've written up the details at InfoQ so I won't bother going into them again here.

What is worth taking away, though, is not just the fact that Eclipse is a successful Foundation (which it is), or the fact that Eclipse projects generally make great software (which they do), but the fact that co-ordination in open source projects is not only possible, but it's also predictable.

The Releases on Wikipedia lists Eclipse releases by the Eclipse Foundation, which was created back in 2004, but in fact the Eclipse project goes back to a public release in November 2001. Before then, Eclipse was the embryonic foundation of WebSphere Studio, an HTML editor for WebSphere's servlet engine (at the time, called just WebSphere) to complement Visual Age for Java's IDE.

Since VAJ wasn't sensibly able to handle non-Java files, WebSphere Studio grew out of a desire to have first plain HTML, then later mixed HTML, JSP and Servlet content. VAJ didn't have a large following at the time (though, like Visual Age for Smalltalk users, it did have its loyal userbase). At the time, products like Symantic Café and Borland's JBuilder had larger followings, in part due to VAJ's organisation at the logical (class/method) instead of physical (file), and in part due to other products being cheaper.

Eclipse today still has the “Java Browsing” perspective, modelled on VAJ's logical view of the world; it is telling that most Eclipse users automatically use the “Java” perspective, which provides a tree-view onto the physical representation and thus needs tools like Mylyn to reduce the clutter. Perhaps this predisposition comes from the fact that Windows and Linux file managers often only have a tree-based view of the world, whereas other operating systems like Nextstep and OSX have column views which can show more information in a more condensed way.

NetBeans was open-sourced in 2000, and IntelliJ was released in early 2001. Eclipse 1.0 was released in November 2001, making it the newcomer in the IDE scene. At the time, it was buggy (and had a tendency to not notice when files were changed outside of the IDE, requiring manual refreshes in order to open them) but none the less, its approach to modular extension was ahead of IntelliJ (which wouldn't be open-sourced until much later) and really kick-started the thoughts of modular software development. Eclipse 2.0 followed seven months later in June 2002.

Since then, the Eclipse project – and later, the simultaneous releases which congregate around it – have followed a yearly release pattern. Whilst 2.1 was released in March (in time for EclipseCon), all the other releases have been in June and have kept to a fixed schedule, including milestone builds and release candidates (often with an M5a or M6a for good measure). Eclipse releases have been clockwork:

  • Eclipse 1.0 – 7 November 2001 (Win32/Linux32 Motif)
  • Eclipse 2.0 – 27 June 2002 (Linux32 Motif + GTK, and Solaris/QNX/AIX)
  • Eclipse 2.1 – 27 March 2003 (OSX first version)
  • Eclipse 3.0 – 25 June 2004 (first OSGi version)
  • Eclipse 3.1 – 27 June 2005
  • Eclipse 3.2 – 29 June 2006 (Callisto)
  • Eclipse 3.3 – 25 June 2007 (Europa)
  • Eclipse 3.4 – 17 June 2008 (Ganymede)
  • Eclipse 3.5 – 11 June 2009 (Galileo)
  • Eclipse 3.6 – 8 June 2010 (Helios)
  • Eclipse 3.7 – 22 June 2011 (Indigo)

Whilst we're not quite at the 10 year anniversary of Eclipse (which will be in November this year), the fact is that over the last ten years we have had one release per year, and that since Eclipse 3.0 – when the Eclipse runtime transitioned to OSGi – that release has been predictable in mid to late June, often known about a year in advance. We'll probably be having the same conversation on the 20th June 2012.

Incidentally, the switch to OSGi back in 2004 is worth calling out. A brilliant decision, often not understood or liked at the time, OSGi has come to be a bedrock of not just Eclipse but many other servers and systems in the years since. The wider use of the OSGi service model has been percolating ever since, leading to tools like E4 and a gradual drift away from the Eclipse registry. Not only that, it has made OSGi popular as a standalone technology, with developers moving on from just building Eclipse plugins to designing fully-fledged OSGi systems. At least one of the lasting legacies of the Eclipse platform has been the certification that OSGi is a trusted module system.

My involvement with Eclipse has spanned over the decade as well, with experiences in pre-release WebSphere software following through to testing for the OSX release in 2.1 and beyond. My oldest public bug report dates back to Eclipse 2.0 in November 2002, although that's just one of the 570 bugs I've filed over the past 7½ years, a rate of around 75/year – or to put it another way, three bugs every two weeks for the last 380 weeks.

Hopefully by reporting, and in some cases patching, I've been able to help make the Eclipse platform and ecosystem a better place. I certainly hope my blogging and articles on sites like InfoQ and EclipseZone have come in useful to others over the years. I know that there are those who think my bug reporting style (and ties) are brash; but none of it has been driven by “this bug is affecting me, fix it!” but rather “this could impact other people.”

In fact, for most of the last five years, I've not been working in Java or using Eclipse in a work environment; it's all been just a hobby. The only times I've been to EclipseCon were self funded; and the last two EclipseCons I've covered remotely.

The last ten years have had their ups and downs, both technologically and personally. But I'm sure that if I'm still around ten years hence, I'll be writing about my continued involvement in the Eclipse platform for the last two decades.

Tuesday, June 21, 2011

Git Tip of the Week: EGit

References

This week's Git Tip of the Week is about using Eclipse with EGit. You can subscribe to the feed if you want to receive new instalments automatically.


(E)Git at Eclipse - a history

This post is about using EGit in Eclipse; if you are uninterested in Eclipse as a platform, feel free to come back next week.

EGit, and its library layer JGit, have been in development for a long time and this week will be shipped as version 1.0 with the Eclipse Indigo simultaneous release. This is the first time anything other than CVS has been shipped by default out of the box for an Eclipse download.

As well as being good news for Git, it's also great news for Eclipse. The transition towards DVCS has been a long road and yet EGit is just the beginning.

Originally, Eclipse shipped with just CVS, but whilst additional plugins were able to be used to access Subversion, none of them were shipped with the default package due to licensing issues. It also didn't help that there were two competing Subversion projects, Subclipse and Subversive, both of which needed additional binaries in order to work. (It really didn't help Subclipse's case that it didn't ship a pre-packaged OSX client, just as OSX was taking off as the de-facto development and conference-touting laptop.) Even now, Subclipse doesn't ship with drivers for 64-bit Windows as this becomes a more common platform.

The nail in the coffin for Eclipse's subversion usage ultimately was Git and GitHub. Whatever your personal preferences of (D)VCS are, there can be no doubt that GitHub has transformed the industry in adopting DVCS, and more specifically, Git. Not only that, but with Eclipse being swayed by Git, and Apache's read-only Git mirrors, it's clear that the majority of foundations are leaning towards Git support. (Google Code remains the outlier with Hg; in part, due to its implementation in Python – but in future, Google Code will support Git as well).

EGit is the set of Eclipse UI libraries that integrate with the Team providers, whilst it relies on a re-implementation of the core Git libraries in Java, JGit. As well as powering EGit, JGit also powers the runtime inside Gerrit, a popular review tool (which I've written about before), as well as a port to Android in the form of Agit.

One of JGit's advantages is that the Git on-disk format is both well documented and well understood. In fact, it's this on-disk format that has resulted in most of the additional libraries and tools being made available; instead of having to call out to a specific blessed library (like SVN and Hg do), a Git client is capable of creating its own tree from content in an existing Git repository. Pretty much every Git tool reads, processes and generates trees of objects and references to those.

Using EGit

Most of the Eclipse Indigo release packages already have EGit in place; but if not, it's a simple operation to go to the EGit entry on Eclipse Marketplace to download it into your client. Whilst Helios shipped with a 0.12 version in service release 2, Indigo ships with version 1.0 – although that can also be installed in a Helios runtime if you want to add the update site to your runtime.

There's a lot of good documentation on the EGit wiki that includes my Git for Eclipse Users, which gives a good background in Git for those who aren't aware. But there's also lots of screenshots to show you how to get things done as well.

Unlike CVS/SVN repositories, a Git repository will exist on your machine and your project will be hosted out of that. It's not recommended to create Git repositories under the workspace directory – Eclipse doesn't tend to like that. When you create a new Git repository (from the Git Repositories view, or from a newly shared project), it will default to putting it in ~/git. you can then create projects underneath that location, or share a project and choose that Git repository.

The other aspect to note is all projects in a Git repository share the same branch. If you have two projects, both on master, then if you switch branch on one project (say, to release37) then both projects branches will be changed. If you don't want that, you can create a clone of the Git repository locally, and remap the project to the local clone. However, this is likely to cause confusion if you do this normally.

Once you've shared your project (or imported it from a previously created git repository) then using it is much like any other team provider in Eclipse – you can commit, merge, branch, compare etc. as normal.

The only significant difference is that commit operations are local (i.e. affect your own local repository) rather than remote. So others won't see your changes unless you push them up. Similarly, if you want to get changes from others, you need to pull them down.

If you're using a Gerrit workflow, then it's worth enabling the (undocumented) gerrit.createchangeid flag, which enables the automatic creation of the Change-Id field. This is set if you clone from a Gerrit repository in the first place, but doesn't have an option to set it up afterwards.

If you're not using Gerrit, then having a clone with a pull policy set to 'rebase' is the most common one you'll find for emulation of the traditional workflows. You can configure this when cloning a project initially, but if not, you can set git config branch.autosetuprebase always, followed by configuring it for the branch(es) you have checked out with git config branch.name.rebase true.

The migration begins ...

Many projects have already moved to Git at Eclipse – the full list is at http://git.eclipse.org/c/, and major players like CDT and EclipseRT are in the process of joining other projects, like Virgo and ECF, who have already made the transition.

There are still some rough edges, both in EGit and also in the Git contributions policy, which are likely to be overcome by the end of the year. But having support out of the box for Git within all Eclipse runtimes, and the fact that projects are stepping up to move over to Git, means that the tooling will be under close scrutiny and improve over the coming months and years.


Come back next week for another instalment in the Git Tip of the Week series.

Tuesday, June 14, 2011

Git Tip of the Week: Rebasing Revisited

References

This week's Git Tip of the Week is about using rebase to move between branches. You can subscribe to the feed if you want to receive new instalments automatically.


Interactive Rebasing

In my last rebasing post, I discussed the concept of rebasing, the ability to create new history by replaying past commits in a different order. In fact, most of that post turned out to be discussing interactive rebasing, which allows you to change the order of commits, squash two (or more) commits into one and even remove commits from the history.

However, rebasing also plays another key part in the way developers frequently interact with git, by moving a branch in its entirety forward to a new point.

Rebasing branches

Let's say you've been working on a feature branched off a known point, and you now want to commit it to the repository. Let's assume the history looks like:

A → B → C → 1 → 2 → 3

where C was the point of master at the time of starting the feature branch. If the remote branch has moved on since then (say, to “F”), we have the option of creating a merge node “G”, or moving our feature branch forwards, based on where the remote is now:

A → B → C → D → E → F → G // Merge node
         \→ 1 → 2 → 3 →/

A → B → C &rarr D → E → F ⇒ 1 → 2 → 3

Some developers or large development teams have a preference to create merge nodes, not only as a way of avoiding problems (the merge node can be tested) but also as a way of documenting where it came from in the first place. Some teams even create merge nodes when they're not needed (such as the git pull --no-ff option).

Other developers like trying to keep the number of merge nodes to a minimum, to try (as far as is possible) to have a linear history in the repository.

Either way, although there's not a right answer, Git allows you to do both depending on what the right answer is for you, at that particular point.

Rebase example

If we wanted to achieve the second history, we could write an interactive rebase script which picked first changes D, E and F, followed by 1, 2, and 3. However, doing this manually would be error prone, particularly if the branch has moved on more than a few commits.

That's where git rebase onto comes into play.

If the above branches were named master (for A..G) and feature (for 1..3), we can transplant feature forwards with:


$ git rebase master feature
First, rewinding head to replay your work on top of it...
Applying: 1
Applying: 2
Applying: 3

This takes the set of changes in feature that aren't in master, applies them to where master is now, and then calls that the new feature branch.

The second argument can be optimised away if we're already on the feature branch:


$ git branch
* feature
  master
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: 1
Applying: 2
Applying: 3
72366d5 3
8da923e 2
a4fe060 1
65521f9 F
62f8b88 E
e210d31 D
94c037d C
668b955 B
a34bd14 A

One nice property of git rebase is that it won't duplicate deltas that are the same. So if we need a particular change for a bugfix, it won't re-apply that:


$ git checkout master
Switched to branch 'master'
$ git cherry-pick 8da923e
[master cf6d845] 2
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 2
$ git log --oneline
cf6d845 2
65521f9 F
62f8b88 E
...
$ git rebase master feature
First, rewinding head to replay your work on top of it...
Applying: 1
Applying: 3
$ git log --oneline
ba0ce72 3
6979b71 1
cf6d845 2
65521f9 F
62f8b88 E
...

In this case, we cherry-picked 2 from the branch, and then applied the remaining feature changes afterwards. This is a quick way of doing an interactive rebase if you know you just want to pull a single change but don't want to fire up an editor.

Rebasing onto

So where does --onto come into play? This is used for some serious tree surgery; it basically allows you to take a set of commits, and transplant them onto a different node.

In our example, let's say we had created two feature branches,feature1 and feature2, which were logically independent, but we'd ended up developing it so that feature2 branched off feature1. We'd have something that looked like:


Ma → Mb → Mc ← Master
           \ → F1a → F1b ← Feature1
                       \ → F2a → F2b ← Feature2

Now let's say that we want to push Feature2 to Master, but we don't want to push Feature1 to Master yet (perhaps because it's not ready). We can perform the transplant with a git rebase --onto as follows:


$ git log --oneline feature2
986d8ac F2b
625bcde F2a
fb24802 F1b
d8f7e48 F1a
3cd0af7 Mc
addd99a Mb
7ad7ead Ma
$ git rebase --onto master feature1 feature2
First, rewinding head to replay your work on top of it...
Applying: F2a
Applying: F2b
$ git log --oneline feature2
b3e017a F2b
ac65d2e F2a
3cd0af7 Mc
addd99a Mb
7ad7ead Ma
$ git log --oneline feature1
fb24802 F1b
d8f7e48 F1a
3cd0af7 Mc
addd99a Mb
7ad7ead Ma
$ git log --oneline master
3cd0af7 Mc
addd99a Mb
7ad7ead Ma

We've now transplanted the bit between feature1 and feature2 onto the current master branch. In effect, we're doing a git cherry-pick of all the changes between feature1 and feature2 onto the current point of master, then reseting the feature2 branch (that we're currently on) to point to the new location. Our git repository now looks like:


Ma → Mb → Mc ← Master
           \ → F1a → F1b ← Feature1
            \ → F2a → F2b ← Feature2

Summary

Rebasing is an incredibly convenient way of updating your local repository to the current version of the branch on a remote server, and is used frequently. It can also be used to perform multiple cherry-pick operations and reorder (local) history in order to create new versions of that history.

As with any power tool, care must be taken not to reorder changes which have previously been made available (e.g. via GitHub) as creating new history causes a divergence in the timeline which causes eddies in the space time continuum. Or at least annoys people.


Come back next week for another instalment in the Git Tip of the Week series.

Friday, June 10, 2011

Running Gerrit with Jenkins/Hudson

References

I've been writing about Gerrit and Jenkins/Hudson for some time now, and I've posted a couple of screencasts for Java developers and for iOS developers as well. I've also been promising to write about how to set this up, so you can replicate the goodness of the screencast in your own work location and help spread the word.

That day has finally come, and the article is available at InfoQ under http://www.infoq.com/articles/Gerrit-jenkins-hudson, which should take you through all the steps you need to in order to replicate the experiment.

I've deliberately used the term Jenkins/Hudson throughout, because the technique works fine, regardless of which system you're using. I've also tested it on the latest of both Jenkins and Hudson, as well as the recently-released Gerrit.

A couple of changes are worth pointing out:

  • Gerrit is migrating data from an SQL database to using Git notes for storing review information. The 2.1.x and 2.2.x differ in this regard; and as such, so do the permissioning screens.
  • Gerrit 2.2.1 (and 2.1.7.2) have renamed the “All Projects” project from “--+All+Projects+--” to “All-Projects”. The upgrade script will do this rename for you, but if you have any systems which refer to this by name (or URLs in an article... :-) then you have to change them to reflect that.
  • There was an issue with Hudson 2.0.0 with the latest Gerrit-Trigger, which is fixed with to 2.0.1
  • Gerrit's JSON API changed between 2.1.6 → 2.1.7, and has now been reverted in 2.1.7.2 (and similarly, 2.2.0 → 2.2.1). This may cause some minor knock-on consequences with the Gerrit Trigger and/or Gerrit Mylyn review.

I hope that the article is useful. Any feedback/comments, feel free to add them at the InfoQ article, comments on here, or tweeting me about it.

Wednesday, June 08, 2011

World IPv6 day

References

Today is World IPv6 day, where internet large organisations and small (including my own Bandlem Limited as well as this blog) enable IPv6 connectivity to encourage others to do so.

By enabling IPv6, anyone with a dual-stack will try to connect over IPv6 first and fall back to IPv4 afterwards. So even if your IPv6 connectivity is broken, you should still be able to access the current sites. If you only have an IPv4 address then you will not notice any difference.

I've written over on InfoQ what World IPv6 day is all about, but chances are there are a handful of IPv6 enabled sites that you will visit today, including the popular search engines Google and Bing. It also includes internet social giant Facebook, although at this time Twitter does not appear to be participating in the IPv6 experiment.

You can find out more about your connectivity via http://ipv6-test.com/ if you're interested in knowing what your computer supports.

Tuesday, June 07, 2011

Git Tip of the Week: Cherry Picking

References

This week's Git Tip of the Week is about pulling changes from one branch to another, called cherry picking. You can subscribe to the feed if you want to receive new instalments automatically.


Picking changes

Each commit in a repository corresponds to a full tree of files. Usually, these files have been created over a number of commits. But sometimes, it's necessary to take the delta between two commits, and apply it to a different branch.

One common case where this occurs is when an issue has been identified, and subsequently fixed, but needs to be backported to a previous release branch.

In this case, you don't want to take the current state of the tree (which might have unfinished or untested changes); you just want to take the delta associated with that change.

In other version control systems, you would just create a diff based on the most recent change, and then patch the change on to your release branch. Instead, with Git, we can use the cherry-pick command to do the work for us:


$ git checkout master
$ echo Working >> file.txt
$ git commit -m "Working" file.txt
$ echo BugFix >> bugfix.txt
$ git commit -m "BugFix" bugfix.txt
$ echo More Working >> file.txt
$ git commit -m "More working" file.txt
# We want to apply 'bugfix' to release
$ git checkout release10
$ git cherry-pick master~1
[release10 41037ab] BugFix
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 bugfix.txt

This has allowed us to take a single change – described here with master~1 – and copy the delta into the release branch.

Sets of changes

We can pick commit sets (ranges of revisions) to pick if we wanted to. Had we had several changes, we could have master~3..master~1. Unlike just generating a diff and then patching the current tree, this will copy the commits (and their relationship) over to the new branch.

We've actually seen pick in use already; when we covered rebasing last week. When you create a series of commands for rebasing, you're actually giving it instructions to pick or edit existing changes:


@ edit 7bf9271 Typo
@ pick 756281e First
@ fixup 07e9061 First
@ pick 13aba60 Second

The “pick” here means the same as “git cherry-pick” for the single change.

In fact, “edit” is really a short-hand for “git cherry-pick -e”, and “fixup” and “squash” are short-hands for git cherry-pick -n.

Recording source

Finally, it's worth noting that when you copy a change using this mechanism, the commit hash will change (notably because it will have a different parent hierarchy).

Sometimes that doesn't matter, but if you want to record where the original change came from, you can run git cherry-pick -x. This inserts a commit message indicating where the original change came from:


# From example above
$ git checkout release10
$ git cherry-pick -x master~1
$ git cherry-pick master~1
[release10 41037ab] BugFix (cherry picked from commit 938a4c0bbb3985524192aa8a926ea6757263e94b)
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 bugfix.txt

However, note that this usually only makes sense if the change that's being cherry-picked is from a public branch (so that the referenced change is visible). Another way of representing this change is to create a merge node between the release branch and the ongoing development branch as a way of showing that the merge has occurred.

Creating new history

Whenever you are cherry-picking, especially if you are reordering commits, you're creating new history. However, you've never really thrown away old history; it's all available from the reflogs. You're not destroying history, you're creating alternate histories. All cherry-picking gives you is the ability to apply patches from other branches in a safe and error-less manner.


Come back next week for another instalment in the Git Tip of the Week series.

Wednesday, June 01, 2011

Git Tip of the Week: Rebasing

References

This week's Git Tip of the Week is about rewriting history. You can subscribe to the feed if you want to receive new instalments automatically.


Rewriting history

One of the philosophical differences between Git and Mercurial is whether history should be allowed to be re-written or not. When a commit is made, the commit hash represents a point on that history – and subsequent commits then rely on that hash for integrity and representation of parental links.

Rewriting history can thus be dangerous; if you change a commit in the past, it invalidates the current commit's hash. Instead, you need to re-commit the current change against the new commit to get a new hash value.

However, dangerous is relative. It's not always the case that changing the history is bad – if that history is local, and hasn't been seen by anyone else, then the only person it's affecting is you. As long as you know what you're doing (and who you will affect) then changing the local history is no different than undoing your local changes in a text editor and re-saving.

(Side note; Mercurial has the concept of 'patch queues' which are the equivalent of local mutable history – but you end up with two separate repository concepts instead of a single concept of history as in Git.)

So, the question is not so much as whether rewriting history is dangerous as to understanding the effects if these changes are exposed to others (e.g. via pushing to GitHub). Sometimes, it's necessary to publicly break the commit hashes – for example, someone accidentally committed a large binary, or copyrighted code which shouldn't be present (or even a password which shouldn't have been committed) – but in these cases, making the change often involves a public notification to warn others.

Rebasing

So, what is rebasing? Well, rebasing is Git's concept of changing (recent) local history. In essence, it is an undo/replay option that you can use to make changes as if they were done in the past.

A Git rebase unwinds history to a particular point (typically specified as HEAD~n where n is the small number of previous commits in the past), and then replays the same changes on top of the code. If the changes are unmodified, then the resulting commit will be the same as before.

However, it's more normal to want to adjust the commit(s) in some way, for example:

  • Reword – change the commit message to something else (e.g. to add a bug reference)
  • Edit – to make changes to the commit itself (e.g. fix a typo in the code)
  • Pick – to include that commit in the history
  • Squash – to condense that commit with the previous and make them one (and concatenate log entry)
  • Fixup – to condense that commit with the previous and make them one (and discard log entry)

As well as these options, it is also possible to re-order them simply by re-ordering the list of changes.

Example

Let's build up a repository with some changes we'd like to make:


$ git init example
Initialized empty Git repository in example/.git
$ cd example
$ git commit --allow-empty -m "Initial Commit"
$ echo Helo World > README.txt
$ git add README.txt
$ git commit -m "Typo" README.txt
$ echo Second > Second.txt
$ git add Second.txt
$ git commit -m "Second" Second.txt
$ echo Frst > First.txt
$ git add First.txt
$ git commit -m "First" First.txt
$ echo First > First.txt
$ git add First.txt
$ git commit -m "First" First.txt
$ git log
07e9061 First
756281e First
13aba60 Second
7b49271 Typo
82f9a21 Initial Commit

What we'd like to do is fix the typo made in the first commit, join the two First commits into one, and reorder the Second so that it comes second in the list. To do this, we kick off an interactive rebase, which will give us an editor:


$ git rebase -i 82f9a21
@ pick 7bf9271 Typo
@ pick 13aba60 Second
@ pick 756281e First
@ pick 07e9061 First

What this is saying is a sequence of cherry-picks to replay the history with the specific changes listed. We can re-order them to replay history in a different manner:


@ edit 7bf9271 Typo
@ pick 756281e First
@ fixup 07e9061 First
@ pick 13aba60 Second

Git will then rewind history to the parent of the Typo commit, and drop us down into a shell which allows us to make changes:


$ echo Hello World > README.txt
$ git add README.txt
$ git commit -m "Readme" README.txt
$ git rebase --continue

Here, we've stopped editing for a while and kept going through the rebase operation. We could insert more commits if we wanted to but we've just committed the current state as is. You might also see:


error: could not apply 07e9061... First
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' and run 'git rebase --continue'
Could not apply 07e9061... First

This is caused because we're changing the same line in the same file, and Git is asking if that's OK. We can simply add that file and continue, or (given that 07e9061 is a complete replacement for 756281e) just not have done it in the first place:


$ git rebase --abort
$ git rebase -i 82f9a21
...
@ edit 7bf9271 Typo
@ pick 07e9061 First
@ pick 13aba60 Second

By removing the commit from the list, it is as if that commit never happened. This should run through and allow you to commit all the changes without having any conflicts.

Summary

Rebasing allows you to re-write history in an automated manner, instead of having to unwind and manually replay the changes yourself. It's often used with git rebase -i HEAD~5 (or some other small number) to fix up changes in your local history before merging or pushing to a central repository.

Rebasing also allows the transplantation entire sections of a tree, which we'll talk about another time.

Finally, remember that Git never loses commit data. If you're working against a branch, you've got the branch's reflogs to fall back on; what Git allows you to do is effortlessly rebuild new commit trees (whilst keeping the old commit trees around in your local cache) until you're happy with the result.


Come back next week for another instalment in the Git Tip of the Week series.