Tuesday, November 29, 2011

Git Tip of the Week: Git Submodules

References

This week's Git Tip of the Week is about git submodules. You can subscribe to the feed if you want to receive new instalments automatically.


In the previous post, I wrote about a tounge-in-cheek extension called git bigjobbies; the purposes of which was to store large binaries in a repository without them being part of the main branch (and thus, not appearing in the history).

The reason one might want to do that is to avoid a checkout from the server taking a significant period of time, or taking up additional space once the clone has been made. Although Git provides a good delta encoding for existing files, these tend to only work if the binary data is relatively similar. Typically, large binary assets (such as audio files, movies or even images if they've been saved in a compressed format) share little of the same binary data under the covers.

Git also has a configuration variable, core.bigFileThreshold, which can be used to set the limit at which files are stored as-is without performing any delta comparisons. Files above 512Mb (by default) are stored without any delta compressions to previous versions (though they are deflated at storage time).

The obvious solution to this problem is to store source code (and other compressible assets) in one Git repository, and then store large media assets (sound effects, in-movie videos etc.) in another Git repository. The history of one will therefore not affect history of another.

Submodules

If you're storing these as separate git repositories, how do you ensure that they are kept in sync with each other? Well, you could use tags and rely on convention to ensure that you can acquire the same version of the assets. However, tags can change (although they're not supposed to) and conventions can be circumvented.

Another way to do it is to store a pointer to the assets. (This is similar to the .bigjobbies file suggested before.) Since they are referenced by hash, as long as you can acquire the hash then you will be able to restore the asset.

Git submodules works these two concepts together, by treating a submodule as a logically checked out directory in another repository, but referring it to it by a pointer rather than a full checkout. The submodule (sub repository) can evolve at its own pace, with its own checkouts, and the parent can refer to it by a fixed hash.

Working with submodules

To add a submodule to an existing project, run git submodule add to define a local directory corresponding to the remote Git project's contents. For example, if you wanted to add the BigJobbies project earlier as a submodule, you could do:


$ git init parent
Initialized empty Git repository in parent/.git/
$ cd parent
(master) $ git submodule add http://github.com/alblue/BigJobbies/
Cloning into BigJobbies...
done.
(master) $ ls -AF
.git/		.gitmodules	BigJobbies/
(master) $ cat .gitmodules 
[submodule "BigJobbies"]
	path = BigJobbies
	url = http://github.com/alblue/BigJobbies/
(master) $ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached ..." to unstage)
#
#	new file:   .gitmodules
#	new file:   BigJobbies

Note that this has set up a .gitmodules file and created a BigJobbies directory, corresponding to the BigJobbies cloned data. However, in the git status, it shows up as a file. What's up with that?

If we add the contents, commit, and then look at the tree, we'll get our answer:


(master) $ git commit -m "Added BigJobbies submodule"
[master (root-commit) f34f140] Added BigJobbies submodule
 2 files changed, 4 insertions(+), 0 deletions(-)
 create mode 100644 .gitmodules
 create mode 160000 BigJobbies
(master) $ git ls-tree HEAD
100644 blob 8041b87daf8e7ed034c669c6c5af9d63367dcd78	.gitmodules
160000 commit e9ed329101157ce9be5dc1c2639096bd82d3fa05	BigJobbies
(master) $ (cd BigJobbies; git rev-parse HEAD)
e9ed329101157ce9be5dc1c2639096bd82d3fa05

Instead of a simple mode 100644, which is used for storing a file with rw-r--r-- permissions, 160000 is used instead. This points to a commit, unlike the tree-or-blob that we've seen before. The commit points to the current version of HEAD in the checked out submodule, which as can be seen here is e9ed329101157ce9be5dc1c2639096bd82d3fa05.

The parent repository is now pretty slim; it contains the .gitmodules file and nothing else. However, it is also versioned in lock-step with the BigJobbies repository. Anyone who wants to clone this repository will find they can resolve the repository, albeit with a separate step:


(master) $ cd ..
$ git clone parent clone
Cloning into clone...
done.
$ cd clone
(master) $ ls BigJobbies/
(master) $ git submodule sync
Synchronizing submodule url for 'BigJobbies'
(master) $ ls BigJobbies/
(master) $ git submodule update
Cloning into BigJobbies...
done.
Submodule path 'BigJobbies': checked out 'e9ed329101157ce9be5dc1c2639096bd82d3fa05'
(master) $ ls BigJobbies/
LICENSE.txt	Movies		README.md	git-bigjobbies

In other words, we can clone the parent without acquiring any of its children. However, to populate the child submodules, we need to run a git submodule update command, which brings in the new code. (You also need to run the update when the remote repository has changed contents which you want to acquire as well.)

Parent-child relationships

Sometimes you want to be able to couple two repositories together, such as a game development project with its media assets, or a set of binary releases with a source project. It's tempting to think of these relationships as the binaries being part of the source project (or a submodule), or the media assets as part of the game source (or a submodule).

However, it's often better to reverse the dependency links between these sorts of repository dependencies. In other words, instead of a having a source repository with a child submodule of the binary assets, have a binary assets repository with a submodule of the source.

Flipping the relationship in this way allows you to treat the source repository as a standalone unit, which doesn't need references to the large binaries, but permits a full checkout of the parent repository (which does have the binaries).

For projects where the source has no need for the binaries (like in the precompiled packages for open-source projects) this distinction can save references to upstream binary repositories which may get accidentally checked out (especially if other submodules are used).

It's also possible to put the source and the binaries in two completely independent repositories, then knit them together with a higher level git repository (with two submodules). The parent can then be used as a top-level 'release' repository, whilst still allowing the binaries and source code to be acquired independently.

Finally, one advantage of having the binary (larger) repository being the parent, is that it will still work if you clone it with git clone --depth 1. When you use the --depth 1 flag, you're essentially saying that you don't want any of the history, just the latest commit on that branch. The latest commit will have a pointer to the source code's branch (which will have the full history) and so this permits you to check out a single (latest) version of the binary with access to the full source code's history.


Come back next week for another instalment in the Git Tip of the Week series.

Tuesday, November 22, 2011

Git Tip of the Week: Git BigJobbies

References

This week's Git Tip of the Week is about git bigjobbies. You can subscribe to the feed if you want to receive new instalments automatically.


This tip covers two aspects; firstly, a means to show how Git can easily be extended, and secondly, a means to show that Mercurial's large file support can be implemented relatively easily on top of Git's object store. It should be noted that git bigjobbies is not intended for production use, but as a learning experiment.

Recap of object stores

The git database stores a set of objects by hash. These hashed objects may point to one of objects, trees or commits. Ultimately, a branch (or tag) in Git is just a pointer to a commit, which points to prior commits and a tree; trees points to a recursive graph of trees and blobs.

As a result, you can stick anything you want into a Git repository, provided it's inserted into the hashed object database. In addition, when you clone/fetch/pull from a Git repository, you don't necessarily get everything that the repository contains; you instead get all the reachable commits (and thus transitively, reachable trees and blobs) for the ones you don't have yet. (In the case of a clone, the set of things you have is the empty set which makes the calculation trivial.)

However, you don't get the objects that aren't reachable when you clone. So, failed experiments that didn't work, suggested changes that were not accepted in a Gerrit workflow (or reworked to provide a different implementation), or just branches or offshoots that you're not interested in, are not downloaded when you clone a repository. (Commits which are directly ancestral are of course brought down; only the divergent parts are not downloaded.)

Unreachable objects are ultimately pruned by the garbage collector. Working from known list of roots (e.g. tags, branches) the git gc can work out what objects are no longer reachable from any reference, and ultimately prune them from the record.

We can use the object database to our advantage, to store out-of-band object data in a repository which is not reachable from the branch, but is still referenced in refs and thus resolvable from the centralised decentralised version control system. Enter:

Git Bigjobbies

Git Bigjobbies is an extension I created to demonstrate out-of-band objects being stored in a Git repository. Note that this is neither supported nor recommended. With that out of the way, what does it do and how does it work?


(master) $ touch empty
(master) $ git bigjobbies add empty
(master) $ git status
# On branch master
# Untracked files:
#   (use "git add <file>…" to include in what will be committed)
#
#	.bigjobbies
#	.gitignore
nothing added to commit but untracked files present (use "git add" to track)
(master) $ cat .bigjobbies
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 empty
(master) $ cat .gitignore
empty

The extension writes the object into the database with git hash-object -w, then concatenates it to a file .bigjobbies. Although the object isn't in the tree or referenced by a commit, it still exists in the database. As a result, we can resolve the contents using the hash and into the filesystem, which we record in the .bigjobbies file. Provided this is committed into the branch, we can resolve the file using the hash alone.

But how do we prevent the object being garbage collected when it's not available? Through the general refs/ directory. If an object is referenced from a refs/ file, it will be seen as in use and therefore not garbage collected.

To write a ref, we just need to echo the hash out to a file in the refs directory. It doesn't matter what it's called – so for simplicity we just write out the hash value as the name. To separate it from ordinary git tags and branches, we use refs/bigjobbies/e59de..391 as the name.

Now, when we resolve the objects, we get the contents from the hash in the local store (if it exists); and if not, we resolve via the origin refs/bigjobbies/369de..391 remote reference. As with the Mercurial largefiles extension, it doesn't download the contents of the files unless they're needed; but on the downside, it does need the files to be downloaded ahead of time in order to work off-line. Let's look at how it would work in a clone:


$ cd /tmp
$ git clone /tmp/example other
Cloning into other...
done.
$ cd other
(master) $ ls -a1
(master) $ ls -a1
.
..
.bigjobbies
.git
.gitignore
(master) $ cat .git/refs/bigjobbies/e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
cat: .git/refs/bigjobbies/e69de29bb2d1d6434b8b29ae775ad8c2e48c5391: No such file or directory
(master) $ git bigjobbies resolve
(master) $ ls -a1
.
..
.bigjobbies
.git
.gitignore
empty
(master) $ cat .git/refs/bigjobbies/e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391

The resolve command has dynamically brought in the reference from the remote server and resolved the contents of the file in the local repository. Furthermore, any large files in interim commits will not be resolved, unless they too are mentioned in the .bigjobbies file.

Summary

The point of this was to demonstrate how easy it is for a git extension to be made. All you need do is put the executable with git-bigjobbies as the prefix, and you can run it with git bigjobbies.

In addition, it's a good exercise in understanding how the Git repository works. References are just pointers to hashes, and objects can be stored and referenced by those same hashes. From this, the entire Git tool suite is written; a combination of C and other scripting languages (for example, git-svn is largely written in Perl, and GitHub operates mostly out of Ruby).

You can clone the BigJobbies.git repository from the GitHub repository at http://github.com/alblue/BigJobbies/. The repository that you clone already has some BigJobbies in them; if you do a git bigjobbies resolve at any of the points which have a .bigjobbies file, you will find them downloaded. (Note that an implementation bug relies on the remote being called origin, in case you do git clone -o other.)


Come back next week for another instalment in the Git Tip of the Week series.

Wednesday, November 16, 2011

Declarative Services vs Objects

References

I'm coming to the conclusion there is an opportunity for improvement of the Declarative Services specification. Right now, DS needs to be able to instantiate a class and then mutate it as services come and go, instead of creating and disposing of classes on demand.

This poses somewhat of a problem if you want to publish objects in the OSGi service registry of types which you don't control. According to the Declarative Services specification, you need to have an object which has a default constructor. The object is then fully configured with a number of bind and unbind methods, which means the object goes through a series of states where it is not fully valid.

There's also the category of classes which aren't under your control (e.g. an existing API) or are acquired elsewhere (e.g. a third party library). These often can't be extended or otherwise mutated.

Example

Let's say we want to have a list of URLs stored as services in the registry. We'll be publishing URL objects as the service (and using java.net.URL as the interface type as well; it is a contrived example, after all). Now, the URL doesn't have a default constructor; every URL must have a url. So we can use it for registering a component which needs a service (e.g. BookmarkService) quite happily. But we can't register a URL with DS.

This presents us with a challenge. We might have a component which can register URL services, but given a 1..n BookmarkService and no URLs published, Declarative Services can't help us. In essence, there's no way to have a delayed URL service in DS.

What you can do is provide a URLFactoryService with DS. This can be activated on demand, and a registerURLs() method called which in turn registers a bunch of URL objects in the registry; and once that's done, DS can kick off the BookmarkService.

Unfortunately, there's no way of being able to tell DS that the URLFactoryService is something which is capable of generating URL objects, so DS never knows it needs to start the URLFactoryService to acquire the URLs.

Solution

The solutions are either to not use DS to populate the initial set of URLs (and require an earlyStart or startlevel hack to bring the initial bundle on-line) or to modify DS in a way which allows the implementation class to be a factory for the interface type but without being a subtype of it. The existing factory specification currently requires that the factory class is still an instance of the interface class; all that changes is the cardinality.

Here's what the solution might look like:


<scr:component xmlns:scr="http://www.osgi.org/xmlns/scr/v1.1.0" name="URLdemo">
   <implementation class="URLService"/>
   <service>
      <provide interface="java.net.URL"/>
   </service>
</scr:component>

In this case, the URLFactoryService doesn't implement/extend java.net.URL. But this tells DS that when this component is activated, it gets a URL out of it, which is what we want DS to know.

We could use a new API, like ServiceOnDemand, which is a generified type that can provide us with a service when we need it:


public interface ServiceOnDemand<T> {
  public T getService();
}

If the URLFactoryService implemented the ServiceOnDemand<URL> interface, then a call to getService would return us with the URL object that could subsequently be registered on our behalf into the service registry. We can still use the bind and unbind calls as before; they just affect the ServiceOnDemand dataset. After each change of the properties (and providing the component was valid) the getService() could be treated as a factory for these object types, being called only when it is in a fully configured state.

For tear down, it becomes just as easy. Instead of removing the URLFactoryService from the registry, we can just remove its previously generated URL objects. The new DS would have to co-ordinate to know which services were registered and associated with which factory; but this shouldn't be significantly different to the way the factory works at the moment.

We could also use this in conjunction with the DS factory class. The key difference here is that the factory class must implement the interface provided, which isn't necessarily always possible.

Summary

DS, and in particular its lazy activation of components, is a great way of providing systems which evolve into being. Unfortunately, the current system is constrained to allowing only objects to be registered to which the user has full control. Although the above used URL as an example, the same could also be said for network-discovered services such as over LDAP or DNS service records, or even database connections (which typically can't be subclassed since they are binary only drivers). Finally, this could be the missing link between config admin and declarative services, which have never really played well together. Being able to use DS to fire up a factory, which consumes its configuration and then is able to generate multiple services would allow us all to avoid the pain and effort that is otherwise trying to control start ordering in order to provide services dynamically.

Tuesday, November 15, 2011

What else can Google screw up?

References

As if screwing up reader wasn't enough, they appear to have moved the screw-up-squad over to Blogger.

For years (i.e. since Blogger started), blog posts have had a title and a post. Fairly easy to distinguish; one shows up as the title of the page, and one shows up as the body. Feeds recognise this, the page renders this – it's pretty much a given.

However, thanks to the Google+Doom operation, the Blogger codebase now appears to confuse a blog post with a giant + post. So instead of using the title given to derive the URL (e.g. http://alblue.bandlem.com/2011/11/git-tip-of-week-git-notes.html), it now uses the first few sentences of the posts. Given that all of my Git Tip of the Week posts start with "This week's Git Tip of the Week is about" it now turns out that any post I create (this month) is called http://alblue.bandlem.com/2011/11/this-weeks-git-tip-of-week-is-about-git.html. Fricking useless.

Fortunately, you can work around this – as this post does – by making a dummy post with "What else can Google screw up?" as the first paragraph. Post it to Blogger, and it generates what the old URL looked like; you can then edit the post and remove the dummy sentence from the post.

Git Tip of the Week: GC and Pruning

References

This week's Git Tip of the Week is about git gc. You can subscribe to the feed if you want to receive new instalments automatically.


Paul Webster recently wrote Where the git did that go? in relation to the incredible disappearing commits. Some said that a version control system shouldn't be able to do this; but it's actually all part of Git's functionality. Let's look at what happened.

A git repository stores commits in a transitive closure (a real closure, not a lambda) from the reachable items through to every commit (and every tree and blob). It's not possible to remove a commit – and therefore the trees and blobs that make up that repository.

So, how is it possible to lose data with Git? Well, if you are using a standard Git repository, you can create branches with git branch; and delete them with git branch -d. When you delete a branch, you remove the pointer to the last commit – but you don't actually lose the commits.

In addition, the git reflog, which we covered previously, stores a list of the previous branch pointers. In other words, even if you delete a branch, the reflog has got your back.

Generally speaking however, only repositories with working directories have reflogs; bare repositories tend not to. There is a config option, git config core.logAllRefUpdates, which can be used to force it on all repositories – or disable it completely if it's not needed.

Even without a reflog, the commits aren't removed immediately. If you run a git gc, which repacks the repository into a more efficient structure, it will export non-referenced commits as loose objects. (You have to ensure that there aren't any branches or tags or reflogs to see this behaviour; if there's an existing pointer then it will not evict the object from the packfile.)

Running a git fsck will check that all objects are present as expected. You can also see what is no longer referenced; running git fsck --unreachable will show you which commits are no longer reachable due to deleted branches or removed tags. Running git fsck --unreachable daily and mailing reports will give a good early warning of commits about to disappear if it's a concern.

Objects which are no longer referenced can be evicted with git prune; though this is a low-level operation which is often called from git gc. By default it will not remove commits newer than 2 weeks old, and of course the commits that are reachable from that; so provided the branch (or tag) deleted has recent commits, it will stay around in the git repository for up two a fortnight afterwards.

Avoiding future issues

Both branches and tags can be deleted; and when invoking a remote push operation a missing branch (or tag) on the client side can invoke a delete; for example: git push github :refs/heads/master will delete the 'master' branch off the remote repository known as github. If this is in a script, such as git push github $COMIT:refs/heads/master and the variable is misspelled (therefore evaluates to the empty string) this can inadvertently delete the branch. (The same is true for tags in :refs/tags/.)

A remote repository can disable such operations with the setting receive.denyDeletes to prevent any ref deletion, and avoiding non-fast-forward branches with the receive.denyNonFastforwards. If either of these are set, then deletes have no operation and pushes cannot overwrite code which doesn't strictly follow it in history. (This is occasionally a useful operation; it may be necessary to provide a means to elevate this in certain situations if necessary.)

In addition, ensuring that branches have core.logAllRefUpdates will ensure that the repository still keeps the history of the branches, at least for gc.reflogexpire and gc.reflogexpireunreachable days.

Summary

Whilst git can be used, there are powerful options which can tweak or constrain its behaviour. In the face of scripts which have full access to the remote repository, it is advisable to have a more controlled set of options rather than the default you-can-do-anything approach. With this knowledge in mind, you should be able to set your options appropriately for your environment.


Come back next week for another instalment in the Git Tip of the Week series.

Wednesday, November 09, 2011

Flash in the pan

References

According to ZDNet, Adobe is to stop development on Flash Mobile players.

Our future work with Flash on mobile devices will be focused on enabling Flash developers to package native apps with Adobe AIR for all the major app stores. We will no longer adapt Flash Player for mobile devices to new browser, OS version or device configurations. Some of our source code licensees may opt to continue working on and releasing their own implementations. We will continue to support the current Android and PlayBook configurations with critical bug fixes and security updates.

This doesn't mean that Adobe is out of the Flash business entirely, but it does mean that adverts and other interactive content intended for viewing on mobile devices will gradually be transitioned over to HTML5. It also signifies an increase in their HTML5 generation tools, which is ultimately Adobe's income stream from Flash.

Whether Google keeps supporting Flash for Android or not remains to be seen; and if Android stops supporting it, theni t's quite likely that it will be dead in the water. I suspect that it will continue to live on in the next Android and Windows release, but probably the subsequent one it's unlikely to happen.

JavaScript is the new bytecode

Instead, HTML5 is being seen as the new application runtime. Long predicted by Jobs, the advantage of an open runtime is that multiple companies can compete on providing the most efficient experience, whilst disassociating it from the content creation experience. Various toolkits already exist (JQueryMobile, JQueryTouch) but higher level drag-and-drop applications are still missing in action. Some systems – like GWT – compile down to JavaScript, and this may be an avenue taken by others (e.g. Dart, CoffeeScript) to give a better runtime.

However, JavaScript isn't a pleasant language to write in. Not from a syntax perspective, but from the quirks that the language gives you; for example, you can't assume that !!a is the same as a, and if you miss out on a var then serious bugs can be introduced.

Adobe already had some HTML5 generation tools available, and I expect that these will take a greater focus in the future. Even though Flash will still be supported for the desktop, mobile devices will need HTML5 and conveniently this will also run on desktops as well.

Security

Outdated software is the key reason for security exploits. Some may argue Flash as being the biggest carrier of such exploits; but in reality, any system may have remotely exploitable bugs if it's not kept up-to-date. Flash doesn't help, in that it requires Administrator privileges on Windows and OSX in order to update itself, which means the 'check for updates' frequently doesn't unless you happen to follow bad practices.

It's also worth remembering that browsers themselves (or the libraries that they depend on) form a huge attack surface. As browsers have got more complicated – especially introducing direct memory manipulation processes such as WebGL – the likelihood is that these runtimes will become the new security concerns in the future. But unlike Flash, these can't be disabled by removing a plug-in; it's there whether you like it or not.

One of the most recent examples is the latest WebKit hack on an iPhone device. By working with the JIT compiler for JavaScript performance improvements, Charlie Miller was able to download and execute rogue code through a specially developed (native) iOS application.

Summary

Mobile flash going away is a win for everyone. It promised much and delivered poor-to-average performance for the platforms it was available on. The fact that it remains a key way of distributing videos will slowly fade away with the adoption of the HTML5 video tag; in any case, many assets are already available in the standard H.264 codec, which is supported in all of the major browsers. (Firefox is no longer considered a major browser; thanks to its recent breaking upgrade policy of throwing new bits over the wall every month, it has become a joke in the browsing world.)

Whilst Adobe may be trimming down the size of its workforce and refocussing efforts, those efforts were probably always better placed in the content creation rather than content runtime tools. Unfortunately, this means an increased likelihood of seeing adverts (which cannot be blocked through plug-in filters alone) on both mobile and desktop browsers.

Tuesday, November 08, 2011

Git Tip of the Week: Git Notes

References

This week's Git Tip of the Week is about git notes. You can subscribe to the feed if you want to receive new instalments automatically.


At a recent talk for the London Java Community (recorded video is available via the link), I presented Git and Gerrit (based on the successful screencasts I have done previously). One of the things I demonstrated was the use of git notes, so I thought writing about them and explaining what they are made sense.

When files are committed into a Git repository, they are addressed by a hash of the contents. The same is true of trees and commits. One of the benefits of this structure is that the objects cannot be modified after they have been committed (since doing so would change that hash).

However, sometimes it is desirable to be able to add metadata to a commit after it has already been committed. There are three ways of doing this:

  1. Amend the commit message to add in the additional metadata, accepting this will change the branch.
  2. Create a merge node with a more detailed commit, and push that (so that the previous commit is retained and can be fast forwarded).
  3. Add additional metadata in the form of git notes.

Of these three options, only the last one will not change the current branch.

Git Notes

Git Notes are, in effect, a separate ‘branch’ of the repository (stored at .git/refs/notes). They don't show up in the git branch command (that lists .git/refs/heads by default). However, although you could check it out and manually update it, there is a command provided which helps you do that; git notes.


(master) $ git log --oneline
056ca11 More Stuff Again
9defb31 MoreStuff
0c7ff4f Additional
19b6cdf Initial
(master) $ git notes show
(master) $ git notes add -m "ToDo: Fix stuff"
(master) $ git notes show
ToDo: Fix stuff
(master) $ git log
(master) $ git log
commit 056ca11c01b47e2bfe1e51178b65c80bbdeef7b0
…

    More Stuff Again

Notes:
    ToDo: Fix stuff

When you look at the output of git log, it checks to see if there is an associated note, and if so, prints it out as if it were an appendix to the commit. Furthermore, the notes are mutable and can be updated over time:


(master) $ git notes add --force -m "ToDone: Fixed stuff"
Overwriting existing notes for object 056ca11c01b47e2bfe1e51178b65c80bbdeef7b0
(master) $ git notes show
ToDone: Fixed stuff

The advantage of the notes is that they can be updated without changing the commit message (and therefore the hash) of the item that they are referring to. Of course, this can be used for good as well as bad; but bear in mind the mutability if you need to depend on the notes' contents.

Gits all the way down …

Actually, a better title might have been “objects all the way down”, but I liked this one better.

Since Git is a content addressable database, the notes themselves are git objects. You can even view the history of the branch using git log and even check it out. But how are the notes stored?


(master) $ git log --oneline notes/commits
d6ac2b2 Notes added by 'git notes add'
5eb0ee5 Notes added by 'git notes add'
(master) $ git checkout notes/commits
Note: checking out 'notes/commits'.

You are in 'detached HEAD' state. You can look around, make experimental
…
HEAD is now at d6ac2b2... Notes added by 'git notes add'
((d6ac2b2...)) $ ls
056ca11c01b47e2bfe1e51178b65c80bbdeef7b0
((d6ac2b2...)) $ cat 056ca11c01b47e2bfe1e51178b65c80bbdeef7b0 
ToDone: Fixed stuff

The branch contains a list of notes, with file names referenced by the commit (or other object) ID that they correspond to. We can make a change here and update our notes:


((d6ac2b2...)) $ echo Note: Git notes are just objects >> 056ca11c01b47e2bfe1e51178b65c80bbdeef7b0
((d6ac2b2...)) $ git commit -a -m "Note added by me"
[detached HEAD 89e6afa] Note added by me
 1 files changed, 1 insertions(+), 0 deletions(-)
((89e6afa...)) $ git checkout master
Warning: you are leaving 1 commit behind, not connected to
any of your branches:

  89e6afa Note added by me
…
(master) $ git log HEAD^..HEAD
commit 056ca11c01b47e2bfe1e51178b65c80bbdeef7b0
…
    More Stuff Again

Notes:
    ToDone: Fixed stuff

So, we added a new commit and then switched back to master; but as the warning message told us, this has left the commit behind. We really need to update the refs/notes/comits reference if we want to see the new values:


(master) $ git update-ref refs/notes/commits 89e6afa
(master) $ git log HEAD^..HEAD
commit 056ca11c01b47e2bfe1e51178b65c80bbdeef7b0
…
    More Stuff Again

Notes:
    ToDone: Fixed stuff
    Note: Git notes are just objects

Here, the git update-ref is assigning the content of refs/notes/commits the value 89e6afa… (although it's resolving it to a full 40 character hash and checking that it exists first).

Conventions

Just a quick note on conventions; since the notes file is essentially on its own branch, the content doesn't get merged with merges between branches. If you wanted to merge git notes, then following the Key: Value on separate lines is the way to achieve git note merging nirvana. The merging options for git notes allow for appending of notes (i.e. similar to cat noteV1 noteV2) or sorting and uniquifying the data (i.e. cat noteV1 noteV2 | sort | uniq).

However, the notes don't have to be textual, nor do they have to be something which is mergeable. They don't even need to be on the notes/commits ref; you can create notes based on any reference.

In fact, this is how Gerrit works (which I've written about before). Gerrit stores its review information in the Git repository under notes/review. Ordinarily, this doesn't show up (the git log only shows notes in the notes/commits refspace), but you can make it do so if you want:


(BARE:master) $ git show refs/notes/review
commit bb7cba258eaaf4851b20b66c7ef56775f0cb4367
…
    Update notes for submitted changes
    
    * Goodbye world

diff --git a/f7f38314247063271631cfddf560ea99214cd438 b/…
@@ -0,0 +1,7 @@
+Code-Review+2: Alex Blewitt
+Verified+1: Jenkins
+Submitted-by: Alex Blewitt
+Submitted-at: Thu, 20 Oct 2011 20:11:16 +0100
+Reviewed-on: http://localhost:9080/7
+Project: SkillsMatter
+Branch: refs/heads/master
(BARE:master) $ git log HEAD^..HEAD
commit f7f38314247063271631cfddf560ea99214cd438
…
    Goodbye world
    
    Change-Id: I692f8de08938f22da9d6e26005ba44c95a1479d7
(BARE:master) $ git log --show-notes=* HEAD^..HEAD 
commit f7f38314247063271631cfddf560ea99214cd438
…
    Goodbye world
    
    Change-Id: I692f8de08938f22da9d6e26005ba44c95a1479d7

Notes (review):
    Code-Review+2: Alex Blewitt
    Verified+1: Jenkins
    Submitted-by: Alex Blewitt
    Submitted-at: Thu, 20 Oct 2011 20:11:16 +0100
    Reviewed-on: http://localhost:9080/7
    Project: SkillsMatter
    Branch: refs/heads/master

In this case, I reviewed the commit (with a +2 from me, and a +1 from Jenkins) and it's stored in the Git repository, along with everything else. Normally, it's not received by the user when pulling or cloning; but it is a permanent record on the repository (and will be visible if you e.g. do a git clone --mirror). However, if you want to fetch the notes as well you can do so:


[remote "origin"]
	fetch = +refs/notes/*:refs/notes/*
	fetch = +refs/heads/*:refs/remotes/origin/*
	url = ssh://localhost:29418/SkillsMatter.git
	push = refs/heads/master:refs/for/master

The fetch refspec in bold allows me to pull any/all reviews from the repository and make them available in my local clone.

Exercise for the reader …

Since the Git notes can contain any blob, and it's not cloned by default (unless you specifically review it), you can create a distribution and check it into a repository. Instead of storing it in refs/notes/commit, store it in refs/notes/dist and have the binary generated from your compile system export it as a Git Note pointing to the tag. That way, if you want to check out the pre-built bundle for a given tag, you can use refs/notes/dist to point to the tag you want and extract the full binary.

Of course, you don't really need to use git notes to store any blob in the repository in any case; there's no reason why you couldn't have a refs/dists tree, with one file per tag.

Git notes demonstrates the fact that Git is not just a source code control system, like Hg or Bzr. Instead, it's a content-addressable file-system, which just happens to be able to represent trees and files (blobs) in an easy way. As a result, Git will always be capable of being extended with functionality like Gerrit and git notes, because it is not limited to what it can store in a repository – yet, the cloning of the repository can still be efficient since the data you pull from a clone is only the reachable objects from a specific commit. As a result, review notes (and/or binary distributions) need never be part of a cloned repository, even if it is persisted and available in the same Git back-end.


Come back next week for another instalment in the Git Tip of the Week series.

Monday, November 07, 2011

Ten Years of Eclipse

References

I have written up the last Ten Years of Eclipse at InfoQ, including a bit about the pre-history of where Eclipse came from. I was lucky enough to work with both Visual Age for Java (which was written in Smalltalk), the Visual Age Micro edition (which was written in Java, and the forerunner of WebSphere Studio), the WebSphere Studio product itself and ultimately on through Eclipse 1.0's release to the outside world.

Other than my WebSphere roots, it wasn't until late in the development of the Eclipse 2.1 release that I got involved doing testing for the then-upcoming Mac OSX release of Eclipse. Since then, I've been regularly blogging about Eclipse, occasionally going to the EclipseCon conferences (not for a while, sadly) and even writing on EclipseZone and InfoQ about Eclipse and OSGi topics.

Still, it's not often to say you were part of a decade-long process, even if it is one at the periphery. Here's to the next ten years, and Happy Tenth Birthday, Eclipse!

Wednesday, November 02, 2011

SSHD Server in Java with Kerberos Authentication

References

Having tried (and failed) to get a working SSHD server that supported Kerberos authentication in the past, I was pleasantly surprised by the recently-released Apache Mina SSHD project.

Setting up a project is relatively straightforward; you need to get hold of Apache Mina SSHD 0.6 Jar along with its dependencies (note; since it uses SLF4J for logging, you'll also need an SLF4J implementation such as SLF4J-Log4J12).

You can then create a simple SSH server with the following:


server = SshServer.setUpDefaultServer();
server.setPort(1234);
server.setKeyPairProvider(new SimpleGeneratorHostKeyProvider("/path/to/the/key"));

This sets up a server, running on port 1234, which persists its server key in /path/to/the/key. (It's not mandatory to persist the key; without an argument it regenerates it each time the server starts. But then your clients will complain each time they start that the key has changed if you don't.)

Mind you, as it stands, it's not very useful. It doesn't allow any authentication and doesn't do anything when users are authenticated. First, let's see how we can configure the SSHD server to do something when a connection occurs:


server.setShellFactory(new ProcessShellFactory(new String[] { "ls" });

The ShellFactory is used to instantiate a new process (copying the system input/output) for each new connection that is made. It's possible to write your own as well; in addition, you can take advantage of the SSH channels to do alternative operations.

Kerberos

For the authentication, we're going to look at Kerberos. It's a pain to get right, but the SSH server is capable of supporting it and so it's instructive to know how. This isn't a full guide to Kerberos – one assumes that you have the basics and a working Kerberos environment in the first place. If not, feel free to skip the rest of this section.

Kerberos authenticates users via principals, which are usually of the form me@EXAMPLE.COM for users, and service/canonical.host.name@EXAMPLE.COM for services. For SSH, the normal principal is host/canonical.host.name (the default realm gets added automatically).

Keytabs

To generate a service authenticated by kerberos, the server needs access to a per-host key. This is stored in a Kerberos keytab, which can be exported from the kerberos infrastructure. The details of how to extract a keytab are a part of standard Kerberos administration; in my case, I extracted out a keytab with kadmin.local and the ktadd -k canonical.host.name.keytab -norandkey host/canonical.host.name. Note that this keytab is equivalent to a hashed password, and should be protected as such. Also, the kadmin may result in the key version number being bumped and the key re-randomised; this is to prevent general extraction of the keys. You might find that the server's key is put in /etc/krb5.keytab – although it will be readable by root by default and shouldn't be opened up to the wider world – keytabs are intended only to be permissioned for the individual application(s) that require them.

The default SSH principal will be host/canonical.host.name, which is very often shared with the real SSH server running on the box. Unfortunately there is no easy way of changing this, which means that the Java SSHD server needs access to the same key.

Configuring SSHD for Kerberos authentication

Assuming you have the keytab problem solved, the next step is to hook it up to the server. You can do this with the following:


List<NamedFactory<UserAuth>> userAuthFactories = new ArrayList<NamedFactory<UserAuth>>(1);
userAuthFactories.add(new UserAuthGSS.Factory());
server.setUserAuthFactories(userAuthFactories);

GSSAuthenticator authenticator = new GSSAuthenticator();
authenticator.setKeytabFile(keytab);
// authenticator.setServicePrincipalName(principalName);
server.setGSSAuthenticator(authenticator);

The first part hooks up the server with the GSS authenticator. This is the General Security Service, which is a generic wrapper around Kerberos and other authentication mechanisms. Note that the Apache Mina implementation is specifically crafted towards the Sun implementation, so if you're running in a classloader constrained environment or on a non-Sun/Oracle JVM then there may be problems with this approach. (The CredentialManager can be overridden to be customised to solve this problem.)

The second part configures a GSSAuthenticator to allow a specific keytab to be used. Without this, authentication will fail as the server won't be able to get a kerberos credential to offer to users. (Both the server and client need to have credentials in the Kerberos world; it will often fail because the server has no Kerberos credential sourced from its own keytab.)

Finally, you can optionally specify what the principal is going to be. This is only really of use if you have a custom Java SSH client that can specify its principal. However, it can be useful sometimes if the server is calculating the default principal incorrectly; for example, many hosts have multiple interfaces and therefore the potential for multiple canonical names. The default is to calculate host/ + InetAddress.getLocalHost().getCanonicalName(); so if this is wrong, then you may choose to override it here.

Conclusion

With this code, and a working Kerberos environment, you can now start an SSH server in Java, and connect to it via SSH with Kerberos authentication.

I have put up an example project at GitHub which you can clone via http://github.com/alblue/Examples.git. You can generate a -jar-with-dependencies by running mvn, and specify a port, path to keytab and (optionally) the principal to use for authentication. Note that it includes a log4j.properties file with debug enabled; when you run it you will see a lot of output indicating the stage of the results.


$ java -jar KerberizedSshServer-1.0-jar-with-dependencies.jar 1234 /tmp/canonical.host.name.keytab host/canonical.host.name
2011-11-02 22:25:04,886 [main] INFO  org.apache.sshd.common.util.SecurityUtils - BouncyCastle not registered, using the default JCE provider
2011-11-02 22:25:13,935 [NioProcessor-2] INFO  org.apache.sshd.server.session.ServerSession - Session created...
2011-11-02 22:25:13,963 [NioProcessor-2] INFO  org.apache.sshd.server.session.ServerSession - Client version string: SSH-2.0-OpenSSH_5.6
2011-11-02 22:25:13,963 [NioProcessor-2] DEBUG org.apache.sshd.server.session.ServerSession - Received packet SSH_MSG_KEXINIT
⋮
2011-11-02 22:25:14,060 [NioProcessor-2] INFO  org.apache.sshd.server.session.ServerSession - Authenticating user 'me' with method 'gssapi-with-mic'
2011-11-02 22:25:14,062 [NioProcessor-2] DEBUG org.apache.sshd.server.auth.gss.UserAuthGSS - UserAuthGSS: found Kerberos 5
2011-11-02 22:25:14,083 [NioProcessor-2] INFO  org.apache.sshd.server.session.ServerSession - Authentication not finished
2011-11-02 22:25:14,091 [NioProcessor-2] DEBUG org.apache.sshd.server.session.ServerSession - Received packet SSH_MSG_USERAUTH_INFO_RESPONSE
2011-11-02 22:25:14,091 [NioProcessor-2] INFO  org.apache.sshd.server.session.ServerSession - Received SSH_MSG_USERAUTH_INFO_RESPONSE
2011-11-02 22:25:14,091 [NioProcessor-2] DEBUG org.apache.sshd.server.auth.gss.UserAuthGSS - In krb5.next: msg = SSH_MSG_USERAUTH_INFO_RESPONSE
2011-11-02 22:25:14,118 [NioProcessor-2] INFO  org.apache.sshd.server.auth.gss.UserAuthGSS - GSS identity is me@EXAMPLE.COM
2011-11-02 22:25:14,118 [NioProcessor-2] INFO  org.apache.sshd.server.session.ServerSession - Authentication still not finished
2011-11-02 22:25:14,120 [NioProcessor-2] DEBUG org.apache.sshd.server.session.ServerSession - Received packet SSH_MSG_USERAUTH_GSSAPI_MIC
2011-11-02 22:25:14,120 [NioProcessor-2] INFO  org.apache.sshd.server.session.ServerSession - Received SSH_MSG_USERAUTH_GSSAPI_MIC
2011-11-02 22:25:14,120 [NioProcessor-2] DEBUG org.apache.sshd.server.auth.gss.UserAuthGSS - In krb5.next: msg = SSH_MSG_USERAUTH_GSSAPI_MIC
2011-11-02 22:25:14,121 [NioProcessor-2] DEBUG org.apache.sshd.server.auth.gss.UserAuthGSS - MIC verified
⋮
2011-11-02 22:25:14,142 [NioProcessor-2] DEBUG org.apache.sshd.server.session.ServerSession - Received packet SSH_MSG_CHANNEL_REQUEST
2011-11-02 22:25:14,142 [NioProcessor-2] INFO  org.apache.sshd.server.channel.ChannelSession - Received SSH_MSG_CHANNEL_REQUEST on channel 0
2011-11-02 22:25:14,142 [NioProcessor-2] INFO  org.apache.sshd.server.channel.ChannelSession - Received channel request: shell

To connect from the same host, you likely can do ssh -p 1234 canonical.host.name, and because the user will have access to the per-user credential cache (if you're running the server as the same userid) then it will Just Work™

However, if you're connecting from a remote host, then you may need to invoke it with ssh -K -p 1234 canonical.host.name instead. The -K says to forward the ticket to the server (which is why it says GSS identity is me@EXAMPLE.COM in the debug trace). Your SSH client can supply per-server or global configurations to always forward Kerberos tickets (GSSAPIAuthentication yes in .ssh/config). Without the forwarded Kerberos ticket, the server can't authenticate you against the kerberos mechanism:


2011-11-02 22:30:28,632 [NioProcessor-2] INFO  org.apache.sshd.server.session.ServerSession - Authorized authentication methods: gssapi-with-mic
2011-11-02 22:30:28,634 [NioProcessor-2] DEBUG org.apache.sshd.server.session.ServerSession - Received packet SSH_MSG_USERAUTH_REQUEST
2011-11-02 22:30:28,634 [NioProcessor-2] INFO  org.apache.sshd.server.session.ServerSession - Received SSH_MSG_USERAUTH_REQUEST
2011-11-02 22:30:28,634 [NioProcessor-2] INFO  org.apache.sshd.server.session.ServerSession - Authenticating user 'me' with method 'none'
2011-11-02 22:30:28,634 [NioProcessor-2] INFO  org.apache.sshd.server.session.ServerSession - Unsupported authentication method 'none'

If you're connecting from a Windows host then you'll need to have a client which is capable of making SSH connections and forwarding the Kerberos tickets. Some versions of Putty have GSSAPI support but check whether a Unix-Unix connection works before trying to debug Windows related problems. Note that some clients (e.g. Eclipse EGit) do not support Kerberos-only authentication, but instead use password-based authentication.

Note that it is possible to configure SSHD to use passwords as well (see the UserAuthPassword.Factory()) but this is outside the scope of this post.

Tuesday, November 01, 2011

Google Screws Up Reader

References

Not content with the death of Buzz, Google seems intent on killing of one of the only two products that I use. Today, upon logging into Google Reader, I discovered that the UI has completely fallen apart. From the welcome goodbye note:

Like many of Google’s products, Reader has gained a clean new look, with more space and less clutter, making it even faster to navigate your feeds.

More space is the exact opposite of what's required, both in Reader and in Google+. Whereas in Twitter (and the old Reader) it's possible to see many tweets at once, now it's possible to see even less in Google Reader.

As an example, on my Mactop I have 12 tweets showing. On Google Reader, with a window exactly the same size, I have four items showing. And each of those are two lines long, probably fitting into the size of a tweet (give or take).

Yes, it's nice that they're finally standardising on a single Like button (and having all those Likes in one place makes sense, rather than spread around on an app-by-app basis) but let's face it, Google fired the wrong UI team. Google+ has become synonymous with spam notifications and whilst there are three or four people I follow who regularly write many paragraphs (and let's face it, some of them are Googlers themselves) all that happens is I essentially ignore any updates which aren't on the home screen. The fact of the matter is that Google+ is text-heavy and information-light, whereas Twitter is the exact reverse.

But the point of a blog reader is to be able to (a) remember which ones you have read, and which ones you haven't; (b) display that information so you can scan headlines quickly and determine if you want to see more. The abstract is like the twitter message and the body is like the twitlongerer or whatever service you use.

There will always be people who complain because it's different, and because it's changed. I fully expect those three or four people who like Buzz Google+ to like the new unified look of the two. But Reader has just gone several steps backwards in usability thanks to a user interface which appears to be modelled on the East Coast's current snowstorm whiteout, with just a few bits visible here and there.

Can anyone recommend a good Mac (or Windows) client for reading Google Reader and synchronising the read state across multiple devices? Or even a completely new reader UI from a different company? I'm not staying around for this UI disaster.

Git Tip of the Week: Git Flow

References

This week's Git Tip of the Week is about git flow. You can subscribe to the feed if you want to receive new instalments automatically.


Following on from last week's post on merging, and specifically discussing git merge --no-ff, we will be looking at a popular Git strategy known as git flow.

Originally posted as Git Flow: a successful branching model, it has become popular in a number of larger projects which use branching as a way of introducing features.

Feature branches: the key point in git flow is the creation and use of feature branches. Their purpose is to allow parallel (or independent) feature development which can subsequently be merged back into a consolidated branch. Of course, Git is particularly well suited to this task since branches are free to create and merging can be done easily afterwards.

Feature branches are merged back onto the main development branch, known as develop in git flow. The key point with git merge --no-ff is that it gives you a merge node, even if the develop branch can otherwise be fast forwarded. This merge node allows you to determine on which branch you create the fix in the first place.

Although many consider “feature” to be a large term, in fact, it's quite possible to think of a micro feature as well. Some developers will create a new feature branch for each bug that gets filed, which then preserves a link with the bug that was filed in a bug tracker. (You can also use arbitrary commit messages as well to enable this linkage.)

Release branches: once the develop branch is ready for a release, it is branched onto a (pre)-release branch. This is then used for bugfixes for that release only (which are merged back into the develop branch) in preparation for the final tagging. Each branch is named for its version number; so release-1.0 would be the branch used for developing the code ahead of the 1.0 release itself. Once development of the 1.0 release is finalised, it will be tagged and work on the release-1.0 branch will be stopped. (Git Flow suggests the branch is deleted at this point to prevent accidental further work on that branch. Since it is tagged, it's easy to recreate if ever needed.)

Hotfix branches: there may be a need to spin up a new release quickly (sometimes also known as patch releases) for a production issue, but ahead of the next major (or minor) release. Hotfix branches are also named for the release (e.g. hotfix-1.0.1) and only contain the specific bugfixes for that release. Changes are merged back into the develop branch to ensure that the bug is also fixed in subsequent releases as well. Typically only a few commits exist on hotfix branches at a time, since they often merge back into the release.

What's in a name?

None of these names really change what you can achieve with Git; after all, one can just as easily create a branch from a tag as well as a branch. Indeed, the git flow model suggests work is done against the develop branch instead of the master branch (which is conventionally the main branch for Git based development) – in git flow, the master branch is reserved for the released tags only.

However, one key point is that the merge nodes store the name of the branch that they merged from. This means if the branch names are used, you can trace the flow of a feature from the feature branch through to the develop branch and ultimately a release branch. There's a great image (which I strongly encourage you to look at in the context of the original post).

To demonstrate how it works in practice, here's a shortened sequence of development in Git Flow and the resulting merge tree.


(master) $ git log --oneline
4f5da46 Initial Commit
(master) $ git checkout -b develop
Switched to a new branch 'develop'
(develop) $ …
(develop) $ git checkout -b feature1 develop
Switched to a new branch 'feature1'
(feature1) $ …
(feature1) $ git checkout -b feature2 develop
Switched to a new branch 'feature2'
(feature2) $ …
(feature2) $ git checkout develop; git merge --no-ff feature2
Merge made by recursive.
…
(develop) $ git checkout develop; git merge --no-ff feature1
Merge made by recursive.
…
(develop) $ git checkout -b release-1.0 develop # ready for 1.0
Switched to a new branch 'release-1.0'
(release-1.0) $ echo 1.0 > version; git add version; git commit -m "Version 1.0" version 
[release-1.0 c51b802] Version 1.0
(release-1.0) $ git checkout master
Switched to branch 'master'
(master) $ git merge --no-ff release-1.0
Merge made by recursive.
(master) $ git tag -a 1.0 # tag the release once it's finished
(master) $ git checkout -b hotfix-1.0.1 master # create a hotfix for 1.0
Switched to a new branch 'hotfix-1.0.1'
(hotfix-1.0.1) $ echo 1.0.1 > version; git add version; git commit -m "Version 1.0.1"
…
(hotfix-1.0.1) $ git checkout master
Switched to branch 'master'
(master) $ git merge --no-ff hotfix-1.0.1
Merge made by recursive.
(master) $ git checkout develop # merge the hotfix back into 'develop'
Switched to branch 'develop'
(develop) $ git merge --no-ff hotfix-1.0.1
Merge made by recursive.
(develop) $ git branch -d hotfix-1.0.1
Deleted branch hotfix-1.0.1 (was f43bb33).

Although the edits above don't have any real work contained in them, the point of the merges allows us to generate this psuedo real-world looking merge tree:


(develop) $ git log --decorate --graph --oneline
*   17e4c5f (HEAD, develop) Merge branch 'hotfix-1.0.1' into develop
|\  
| * f43bb33 (hotfix-1.0.1) Hotfix2
| * 50a102d Hotifx1
| * d2fe7d6 Version 1.0.1
| *   d534111 (tag: 1.0) Merge branch 'release-1.0'
| |\  
* | \   77e7eac Merge branch 'release-1.0' into develop
|\ \ \  
| | |/  
| |/|   
| * | 9261fad (release-1.0) Release10Fix2
| * | 2562cb3 Release10Fix1
| * | c51b802 Version 1.0
* | | fbdd638 Work6
* | | 7353285 Work5
|/ /  
* | 538bb9a Work4
* |   cdef254 Merge branch 'feature1' into develop
|\ \  
| * | 793b1bc (feature1) Feature1Work4
| * | 3f2be07 Feature1Work3
| * | 7879593 Feature1Work2
| * | 8e01f4d Feature1Work1
* | |   9239c56 Merge branch 'feature2' into develop
|\ \ \  
| |_|/  
|/| |   
| * | 6752543 (feature2) Feature2Work3
| * | d065311 Feature2Work2
| * | 040fcaf Feature2Work1
| |/  
| * a9a97be Work3
| * a4253c8 Work2
| * 163c835 Work1
|/  
* 4f5da46 Initial Commit

This might seem noisy, but we can condense it down by using the --merges flag of git log, which restricts us to just the merge nodes:


(develop) $ git log --decorate --graph --oneline --merges 
* 17e4c5f (HEAD, develop) Merge branch 'hotfix-1.0.1' into develop
| * d534111 (tag: 1.0) Merge branch 'release-1.0'
* 77e7eac Merge branch 'release-1.0' into develop
* cdef254 Merge branch 'feature1' into develop
* 9239c56 Merge branch 'feature2' into develop

At this point, the benefit of using the --no-ff becomes clear; it is a way of documenting the merges such that they can be filtered by the git log command. In addition, the naming conventions of Git Flow enable a well described set of feature and release branches that is identifiable from the graph.

Whether or not Git Flow is right for you now, it's worth being aware of what it is (and why it follows the path that it does). Even if it's not something that works for you now, as your repositories scale up and your use of git grows, it may be worth coming back to in the future. There's also a set of scripts available at the GitHub repository, which makes managing the individual branches easier.


Come back next week for another instalment in the Git Tip of the Week series.