Tuesday, July 26, 2011

Git Tip of the Week: Git Bisect

References

This week's Git Tip of the Week is about bisecting to find where a failure was introduced. You can subscribe to the feed if you want to receive new instalments automatically.


When a problem is discovered in an existing system, it's not always clear what change caused the regression. Sometimes it's obvious, but in some cases the only way to find out is to go back and test each stage of the history to find out when the code went from 'good' to 'bad'.

Since this can take a significant amount of time, git bisect can be used to help automate the discovery of the problem. It relies on the ability to determine whether a problem is 'good' or 'bad', and uses a binary sort to discover through the history when the error was introduced. To do so, you use the git bisect start, followed by the git bisect good and git bisect bad commands. Let's use it to find out when "Line two" was added:


# Sample file, adding one line per commit
(master) $ git blame test
^b06000a (Alex Blewitt 2011-07-15 08:44:49 +0100 1) Line one
3c30996f (Alex Blewitt 2011-07-15 08:45:04 +0100 2) Line two
4b2a81b8 (Alex Blewitt 2011-07-15 08:45:24 +0100 3) Line three
67664b1e (Alex Blewitt 2011-07-15 08:45:32 +0100 4) Line four
50cc6273 (Alex Blewitt 2011-07-15 08:45:40 +0100 5) Line five
(master) $ git bisect start
(master|BISECTING) $ git bisect good b06000a
(master|BISECTING) $ grep two test
Line two
(master|BISECTING) $ git bisect bad 
Bisecting: 1 revision left to test after this (roughly 1 step)
[4b2a81b86fcdb0e59981dc5d0ecdef5e6567c71b] Third
((4b2a81b...)|BISECTING) $ grep two test
Line two
((4b2a81b...)|BISECTING) $ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[3c30996f285905b16f1d1d8fb293372615272ce4] Second
((3c30996...)|BISECTING) $ git diff HEAD^
diff --git a/test b/test
index 67c8eb8..b8b933b 100644
--- a/test
+++ b/test
@@ -1,2 +1,2 @@
 Line one
-
+Line two

So we specified a good revision (b06000a) and a bad revision (50cc6273) and then picked a midpoint between them. This then recurses until only that commit is left, and in this case, it's the one that introduced the 'bug' (Line two). We can even see what we've done after the fact:


((3c30996...)|BISECTING) $ git bisect log
git bisect start
# good: [b06000a98bd9d88f71356746680404de14b68a89] First
git bisect good b06000a98bd9d88f71356746680404de14b68a89
# bad: [50cc627394f5647e16921c30e4a984b4fd76577c] Fifth
git bisect bad 50cc627394f5647e16921c30e4a984b4fd76577c
# bad: [4b2a81b86fcdb0e59981dc5d0ecdef5e6567c71b] Third
git bisect bad 4b2a81b86fcdb0e59981dc5d0ecdef5e6567c71b

Clearly in this case we could have just identified the line with git blame, but it highlights the principle of solving a problem. In the case of doing these steps, we were manually running grep two to find out if there was a problem; this would typically be replaced with a specific test command, such as make or equivalent. But wait! We can do it faster! Firstly, we can specify the starting good/bad endpoints:


((3c30996...)|BISECTING) $ git bisect reset
Previous HEAD position was 3c30996... Second
Switched to branch 'master'
(master) $ git bisect start HEAD HEAD~5
Bisecting: 1 revision left to test after this (roughly 1 step)
[4b2a81b86fcdb0e59981dc5d0ecdef5e6567c71b] Third

Secondly, we can run a script which can perform some kind of automated tests. With git bisect run we can execute a script per commit which detects whether the condition is good or bad. This could be some kind of make script, or it could be a custom script to test for a particular condition:


((4b2a81b...)|BISECTING) $ echo 'grep two test && exit 1'  > ~/test.sh 
((4b2a81b...)|BISECTING) $ echo 'exit 0' >> ~/test.sh
((4b2a81b...)|BISECTING) $ chmod a+x ~/test.sh
((4b2a81b...)|BISECTING) $ git bisect run ~/test.sh
running /Users/alex/test.sh
Line two
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[3c30996f285905b16f1d1d8fb293372615272ce4] Second
running /Users/alex/test.sh
3c30996f285905b16f1d1d8fb293372615272ce4 is the first bad commit
…
bisect run success

If the script to test needs to be the same for all runs, it's best to put it outside the repository (which will be reset per bisect operation). It's also possible to have a script which merges in simple fixes or builds with additional parameters (e.g. -DDEBUG).


Come back next week for another instalment in the Git Tip of the Week series.

Sunday, July 24, 2011

Full-screen support for Eclipse on OSX Lion

References

If you've updated to Lion, you know that the full-screen apps are likely to be a significant impact to applications. Unfortunately, the release of Eclipse Indigo was just before OSX Lion was released, so it wasn't possible to implement full-screen handling for Eclipse applications.

Fortunately, the change is relatively easy. the code is already in place in SWT to make it happen, although not invoked by Eclipse. If you have an SWT window and you want to make it full-screen on OSX, you can do:


NSWindow nswindow = shell.view.window();
nswindow.setCollectionBehavior(1 << 7);

If you want to apply this to your current Eclipse instance, point an update site to http://github.bandlem.com/ where I have put a plugin together which applies this change to existing Eclipse windows. You can then hit the arrow at the top-right of the screen in order to put Eclipse into full-screen mode; to bring it back, move the mouse to the top-right of the screen and the menu will pop back into action – you can click on the fullscreen icon to bring it back as a regular window.

Hopefully this will be added by default in future versions (bug 349148) of Eclipse rather than needing this plugin as a workaround.

Update: This seems to fail on some versions of OSX (or on some JVMs on OSX) with missing osgi.os properties; in addition, the code is compiled against the 64-bit SWT and so fails when running with a 32-bit SWT. These problems will be fixed in the near future.

Update 2: I have pushed a new version which supports both 32-bit and 64-bit Eclipse. It also fixes the empty toolbar at the top of the screen in full-screen mode.

Update 3: This is now available via the Eclipse Marketplace

Update 4: The plugin now supports coming out of fullscreen mode with Escape as well as adding a Window menu item to Toggle Full Screen. Work continues on bug 349148 which will extend the SWT API to do this natively, probably from Eclipse 3.8 onwards as it's adding new API.

Tuesday, July 19, 2011

Git Tip of the Week: Autocompletion in Shells

References

This week's Git Tip of the Week is about autocompleting shell environments. You can subscribe to the feed if you want to receive new instalments automatically.


Although many GUI clients exist for Git, sometimes it's useful to drop down to the shell in order to work with repositories. The examples used in this blog all show the shell equivalents, since it's more to do with concepts than specific implementations.

This post is different, however, in that it's explicitly about working with Git in shells. If you never use shells for developing with Git repositories – or you're using Windows – then you can safely come back next week for something more general.

Most shells are fully featured programming environments whose talent is frequently underused as a means of inspecting the file system. You can build functions, loops, file processing and so on. And most shells have a means of displaying a prompt which in the examples I'm using has just been $. But this can be changed to any other value by suppling a different value for a variable called PS1:


$ export PS1='+++ Redo From Start +++ '
+++ Redo From Start +++ echo Hello World
Hello World
+++ Redo From Start +++

This isn't particularly exciting; it's just changed the $ for +++ Redo From Start +++. Note that it was specified in quotes to ensure that the trailing space is present in PS1, as otherwise it would blur into the command given.

It turns out that PS1 is evaluated each time, which means we can use a function instead of static text each time we want to use it:


$ d() { date; }
$ d
Thu 14 Jul 2011 08:34:31 BST
$ PS1='$(d) \$ '
Thu 14 Jul 2011 08:34:50 BST $ echo Hello
Hello
Thu 14 Jul 2011 08:34:51 BST $ echo World
World
Thu 14 Jul 2011 08:34:52 BST $ 

Note that each time we hit enter after a command, the function is called again, and you get a different time. (If you see the same time, check you remembered the quotes around the variable as otherwise the function will be evaluated at PS1 assignment time; check it's what you expect with echo $PS1.)

How does that fit in with Git? Well, it turns out there's a great script called git-completion.bash in the contrib directory of your Git installation; if it's not there, you can look take a copy from contrib/completion/git-completion.bash instead.

You can make a copy of the script in your local directory (or just symlink it to the contrib location) – in the examples below, I've copied it as .git-completion.sh in my home directory. You need to source it when your shell starts; the common way (for bash shells) is to add this to your .bashrc script:


$ echo source ~/.git-completion.sh >> .bashrc

The .bashrc is read upon each (non-login) bash shell, though there's a .bash_profile which is used for login shells. To ensure a consistent experience, ensure that your .bash_profile sources your .bashrc and then put changes solely in the .bashrc files:


$ echo source ~/.bashrc >> .bash_profile

Note that on Mac OS X, new Terminal windows and tabs are always considered login shells so this is a necessary step for OS X users.

Showing Git status in the prompt

Having covered the basics, how can we use this to our advantage in Git repositories? Well, the git-completion.sh script defines a function, __git_ps1, which gives you information about what branch you're on, if you're in a Git repository. (Note that you have to start a new shell to ensure you source the completion script):


$ __git_ps1
 (master)
$ cd /tmp
$ __git_ps1
$

This gives us the branch that we're on if we're in a Git repository, and nothing if we're not in a repository. We can use this to set up a prompt accordingly:


$ export PS1='$(__git_ps1) \$ '
 (master) $ cd /tmp
 $

This shows us when we're on the master branch, and doesn't when we're not, but it gives us extra spaces. We can format that by passing a %s (standard printf formatting for a string) to add a space only when necessary:


$ export PS1='$(__git_ps1 "%s ")\$ '
master $ git checkout -b example
example $ 

As well as showing the branch, you can also show whether there are uncommitted changes or whether you are behind/ahead of a remote branch. There are various variables you can set:

  • GIT_PS1_SHOWDIRTYSTATE=true will show a * if there are unstaged and + if there are staged (but uncommitted) changes
  • GIT_PS1_SHOWSTASHSTATE=true will show a $ if there are stashed files
  • GIT_PS1_SHOWTRACKEDFILES=true will show a % if there are tracked files
  • GIT_PS1_SHOWUPSTREAM=auto will show a = if you are at the same commit, < if you are behind, > if you are ahead and <> if you have diverged from the upstream branch.

Autocompletion

So far, we've talked about the PS1 environment variable and tracking the branch name (along with the shells). However, the completion script also adds in the ability to auto-complete git commands (including arguments). Simply by typing a few characters, you can complete argument names and branch/tag names by hitting the TAB key. This can be useful if you don't know the full options that a command can take (type -- and then hit TAB) or when you're referring to a branch/tag frequently and don't want to type it in all the time.

There's a lot more that you can do to customise your shell prompt; since Git runs quickly, and you can set up a function to do what you want, it's possible to print out the current commit id or even use colours to denote different stages. Similar extensions exist for different shells, such as Steve Losh's extravagant ZSH prompt.


Come back next week for another instalment in the Git Tip of the Week series.

Monday, July 18, 2011

Setting up Google Code with Git

References

Since Google Code has started to support Git, I wanted to take the opportunity to reboot the ObjectivEClipse project to consume the Eclipse CDT mirror on GitHub so that I could keep up to date with the main mirror whilst adding support for Objective-C. This post shows you how you can migrate your existing Google Code repositories to Git.

Firstly, you'll need your account details from https://code.google.com/hosting/settings. Since Google Code uses https instead of ssh for its code pushes, we need to know what the password is to use.

For security reasons, I recommend against using your Google Account password. Although it's possible to configure, in essence the password is transmitted in plain text, albeit over an https connection. But if this password gets out, you've lost control to all of your Google Account estate.

Instead, Google Code shows you your GoogleCode password when you visit the URL above:

alex.blewitt@example.com's googlecode.com password: Tm90TXlQVyEK

We can use this password to authenticate instead, and if it gets let out accidentally, like Doctor Who, we can click on the "Regenerate" button and get a new one.

Ordinarily, we wouldn't want to have to remember this each time we use clients. You might be using a GUI tool (like EGit*) which remembers the password for you, but if you want to do things on the command line then using SSH keys to login are widely known but not applicable for https connections.

Instead, we can put the entries in the ~/.netrc file (on Unixes, at least) which will be used for the interop with the server. We can configure it like so:


echo machine code.google.com >> ~/.netrc
echo login alex.blewitt@example.com >> ~/.netrc
echo password Tm90TXlQVyEK >> ~/.netrc
chmod go= ~/.netrc

Note that this applies only to machine connections going to code.google.com, but you might want to configure it for your project's name directly:


echo machine project.googlecode.com >> ~/.netrc
echo login alex.blewitt@example.com >> ~/.netrc
echo password Tm90TXlQVyEK >> ~/.netrc
echo machine wiki.project.googlecode.com >> ~/.netrc
echo login alex.blewitt@example.com >> ~/.netrc
echo password Tm90TXlQVyEK >> ~/.netrc

Note that the project's wiki is in a different location than the main code for Git and Hg repositories (in SVN, it's part of the same repository as the code).

As an alternative, you can access using the code.google.com host for both the project and the wiki – just suffix the project name with .wiki instead, e.g. for https://code.google.com/p/objectiveclipse/ use https://code.google.com/p/objectiveclipse.wiki/

Note that when you convert your project over from SVN to Git, it will appear that the Wiki content gets deleted. That's because the Wiki server starts serving content from your new Wiki location, which won't be populated with the old content.

Fortunately, migrating the wiki content is easy enough. You can access it via the old SVN site you have, which though it's not visible via the main project's URL, is visible via the direct project repository site:


$ git clone https://wiki.project.googlecode.com/git projectWiki
$ svn checkout http://project.googlecode.com/svn/wiki wiki
$ cp wiki/*.wiki projectWiki
$ cd projectWiki
$ git add .
$ git commit -a -m "Imported from old SVN wiki"
$ git push

You now have your wiki pushed to the repository. Of course, if you wanted to maintain history, then you could instead do a git svn clone instead.

Migrating the main repository (from GitHub to GoogleCode) should be as simple as adding a new remote, and hitting push:


$ cd /path/to/existing/github/clone
$ git remote add googlecode https://project.googlecode.com/git
$ git push googlecode master:master

Congratulations, you are now using Google Code in conjunction with Git.

* Note to EGit users: there's currently a bug which prevents EGit from using Google Code since the latter reports an invalid value.

Saturday, July 16, 2011

Google Code and Git Wikis

References

Just an addendum from yesterday's post on Git at Google Code – the wiki pages are backed by a version controlled repository. As a result, if you migrate from one repository format to another, your wiki pages will need to be migrated as well.

The locations are shown on the wiki support page:

If Subversion is used, all wiki pages are stored in the /wiki/ directory of your project's Subversion repository, as files ending in .wiki (other files and subdirectories will be ignored). That means you can use a Subversion client and your favorite text editor to add and edit wiki pages.

If Mercurial is used,the wiki content is stored http://wiki.[projectname].googlecode.com/hg/ where [projectname] is the name of the specific project. For Git, http://wiki.[projectname].googlecode.com/git.

If you have a Google Code project primarily for issue and wiki support, and host your Git content elsewhere, then you need to ensure a seamless transition of your wiki pages to the new Git repository after conversion, as otherwise it appears that all your wiki pages are just deleted.

Friday, July 15, 2011

Git at Google Code

References

It's great news that Google Code has finally come into this century and now supports Git as a native repository format. Previously, it just supported Mercurial and SVN as two repository formats.

I've written it up at InfoQ, and it is yet one more reason to switch to Git from archaic repositories such as Subversion and CVS.

Note that the Support FAQ lists a minimum version of Git 1.6.6, which was when the 'smart http' protocol was introduced. It's likely that Google Code, together with GitHub, have abandoned the 'dumb http' protocol for performance reasons. (GitHub switched off dumb http support last month.)

Note that as well as support for Git on Google Code, it also applies to new (and presumably, existing) projects on Eclipse Labs as well.

Time to re-open the ObjectivEClipse project!

Tuesday, July 12, 2011

Git Tip of the Week: Assigning Blame

References

This week's Git Tip of the Week is about playing the blame game. You can subscribe to the feed if you want to receive new instalments automatically.


Most version control systems have the concept of blame, but in a good way. This allows you to see who has made changes to a file, or when the file was last changed by someone. Git has the same feature as well. This can be used to find out what feature(s) were added in a release in a process known as blamestorming.

To find out who changed a file, you can run git blame against a single file, and you get a breakdown of the file, line-by-line, with the change that last affected that line. It also prints out the timestamp and author information as well:


$ git blame file
566a0863 (Alex Blewitt 2011-07-12 09:43:39 +0100 1) First line
ed0a7c55 (Alex Blewitt 2011-07-12 09:43:51 +0100 2) Second line
8372b725 (Alex Blewitt 2011-07-12 09:44:06 +0100 3) Third line
ed0a7c55 (Alex Blewitt 2011-07-12 09:43:51 +0100 4) 

The timestamps and abbreviated commit hashes show the changes were introduced sequentially, but in this contrived example it's easy to see. Since Git has full information about the committer, it can show you the person's name, or the persons e-mail address:


$ git blame -e file
566a0863 ( 2011-07-12 09:43:39 +0100 1) First line
ed0a7c55 ( 2011-07-12 09:43:51 +0100 2) Second line
8372b725 ( 2011-07-12 09:44:06 +0100 3) Third line
ed0a7c55 ( 2011-07-12 09:43:51 +0100 4) 

Sometimes you get extra noise when whitespace differences are introduced into the file. You can use -w to suppress reporting on changes that only affected whitespace.

There are also changes you can show on a subset of ranges as well (like git diff, but with lines annotated with the commit hash). For example:


$ git blame ed0a..566a -- file
^566a086 (Alex Blewitt 2011-07-12 09:43:39 +0100 1) First line

Finally, if you have large files, it can often generate more output than is necessary. You can post-filter the results, but it's more efficient to tell Git which lines you want to see so that it doesn't need to do more work than is necessary finding out. You can specify lines with an explicit line number, (start/end), an offset from the start line (valid for end only) or a regular expression. This can be useful to blame a specific function, if you are following a C-like formatting where the function begins in column zero. This allows us to see the blame of a specific function (that starting with ^bar) and to the next closing brace (ending with ^} after the ^bar):


$ git blame tst.c
6bee4066 (Alex Blewitt 2011-07-12 09:57:54 +0100  1) foo() {
6bee4066 (Alex Blewitt 2011-07-12 09:57:54 +0100  2)    // the foo function
6bee4066 (Alex Blewitt 2011-07-12 09:57:54 +0100  3) }
6bee4066 (Alex Blewitt 2011-07-12 09:57:54 +0100  4) 
0cdc3645 (Alex Blewitt 2011-07-12 09:58:15 +0100  5) bar() { 
0cdc3645 (Alex Blewitt 2011-07-12 09:58:15 +0100  6)    // the bar function 
0cdc3645 (Alex Blewitt 2011-07-12 09:58:15 +0100  7) } 
6bee4066 (Alex Blewitt 2011-07-12 09:57:54 +0100  8) 
6bee4066 (Alex Blewitt 2011-07-12 09:57:54 +0100  9) main() {
6bee4066 (Alex Blewitt 2011-07-12 09:57:54 +0100 10)    // the main function
6bee4066 (Alex Blewitt 2011-07-12 09:57:54 +0100 11) }
$ git blame -L/^bar/,/^}/ tst.c
0cdc3645 (Alex Blewitt 2011-07-12 09:58:15 +0100 5) bar() { 
0cdc3645 (Alex Blewitt 2011-07-12 09:58:15 +0100 6)     // the bar function 
0cdc3645 (Alex Blewitt 2011-07-12 09:58:15 +0100 7) } 

Come back next week for another instalment in the Git Tip of the Week series.

Friday, July 08, 2011

Real-time Text for Jabber

References

Here's something that might be interesting to those who remember talk (and friends such as ytalk and ntalk); the Real-time Text for Jabber, also known as XEP-0301.

These days, chat systems work by sending a block of text at a time, usually when the person on the other end presses the Enter key. In Google Talk, you sometimes get a notification when someone has started typing; but you don't see anything until they've concluded with what they were trying to say.

Although this lacks immediacy, previous systems had the capability to show people's text as they were typing. These were usually inefficient, sending a network packet for each character pressed – but realistically no more different to a modern remote SSH terminal onto a remote host, which does the same thing.

What seeing someone type does is gives you an idea of a conversation flowing, much like a spoken conversation would. You can interrupt someone (to say that you've heard it, and they needn't spend time going in detail) or ask them to expand on the point that they're making.

The real-time Jabber extension to XMPP has been published as XEP-0301, which means any Jabber client could implement it if it wanted. The way Jabber extensions work is that any Jabber client has a list of supported extensions that it knows about, and if two clients both support it then they are free to use it. This permits (for example) clients to agree on what features they support and so to send real time data (or not). Needless to say, it is also backward compatible and will send a full body after the message has been sent, so in group chats (for example) those clients that can support the real time text can display it, whilst those that don't will simply see the final message.

If the protocol was merely batching up changes and firing them out periodically, then you'd see characters turn up in batches. That wouldn't look natural, and it wouldn't solve the problem of trying to get a conversation to evolve either. Instead, the real time text protocol includes timing information for the characters typed, so that the client on the remote end may receive a single packet of data but then drip feed the characters to the screen at an appropriate rate.

To get an idea of how this looks in practice, take a look at the real-time text demo at the creator's website. This shows the same conversation, shown as it is at the moment (with messages appearing en masse as they are finished), one where the characters are merely batched and flushed each second, and one where the characters are drip fed out. (The example uses an animated GIF to show how it would look, and is fairly effective at getting the point across.)

The status of the real-time text protocol is marked as “Experimental” so it may be a while before it is picked up by other major players such as Google and Apple. However, the XMPP protocol used means that Google Talk could be used to act as the intermediary for two supporting clients. It will be interesting to see if Google is interested in using this for Google Talk on the Web, or whether it is an idea in waiting for the future.

Monday, July 04, 2011

Google selcriC

References

Having used Google+ for a few days now, I'm not entirely convinced that Google Circles are tremendously useful. Yes, they're animated in a snazzy fashion, and yes, it's all JavaScript goodness – but from a practical perspective, it's all pretty useless as a way of categorising people.

To be honest, Google Circles is isomorphic to groups in your local address book, with the exception that you aren't holding onto an e-mail address but rather an identity. That permits, for example others to update their contents whilst keeping the same identity throughout; a kind of distributed address book, if you will. Except that unlike Buzz, it's not integrated into your mail contacts.

Buzz screwed up not because it was integrated with your contacts, but because it auto-exposed your contacts to the wide world, based on frequency of mail exchanges. Had Buzz not exposed contacts automatically, but rather used an 'opt-in', like Google Circles, it would have been much better (and essentially what you've got now with Buzz). But ultimately, where we are now with Circles is the same as we are now with Buzz; the circles exist for your categorisation purposes only.

That's useful when (for example) you have a small group of people that you define, such as close family members or members of a local club. But unfortunately for Google+ social relationships aren't reflexive; if you follow someone, it doesn't imply they follow you (or vice versa).

I post about varied topics on my blog. Eclipse used to be a mainstay, but has dropped off over the years. ZFS used to be another before it forked off. These days, I'm most likely to write about Git – but there are always other topics (like personal health updates or IPv6) that I write about occasionally.

I'm pretty sure my blog as it stands correlates exactly with one person's interests – mine. (There can't be that many people who are bothered about my health, and I know my wife has no interest in Eclipse.) That's why the various feeds I offer are useful; it allows someone to take a feed of just the Eclipse related items (which then gets fed to Planet Eclipse). Those interested in my Git Tip of the Week series can follow their own feed, without having to read things that aren't interesting to them.

All of this works because which feeds to consume are a self service. I don't define a list of people and say “Person X is only interested in Topic Y; I'll ensure they get all those items.” Instead, Person X can choose whether to subscribe to Topic Y (or X or Z) on their own terms and time. People's interests change over time; whilst they might be interested in Topic Y to start with, they may gain an interest in Topic X (and cease to be interested in Topic Y) and so adjust subscriptions accordingly. It shouldn't be necessary for me to be in the loop to make that decision happen.

Finally, the followers and following are asymmetric; or to put it another way, the two sets are unlikely to be a complete overlap. There are people I follow who don't follow me; there are people who follow me whom I don't follow, and there are those that follow each other. The same is true of blogs; this post is likely to be read by some of the bloggers that I follow, but equally, it's likely to be read by other bloggers that I don't (or even who don't blog at all).

Asking me to categorise all my followers is a waste both of my time and also of the opportunity for my followers to categorise themselves. I tag each of my posts so that people know what they are related to; this in turn fills the Eclipse feed or the Git Tip of the Week Feed or the OSGi Feed. And those that are (just) interested in OSGi can consume a tailor-made feed, just for them.

Google Circles, on the other hand, is only good for publishing to publicly, or to a small infrequently changing set such as family (family members don't come and go that frequently). And in all honesty, unless a significant proportion of your family members are on Google+ already, there's probably not a lot of point in even that.

What Google Circles needs is a way of defining publicly subscribable circles. I can create a few circles that are of interest to me (Eclipse, OSGi, iOS) but membership of those circles should be public. That way, I can publish something to just the Eclipse circle, and just those interested in Eclipse will see the information, just like they can today with my blog posts. It's no good allowing anyone to follow me but then expect me to have to put them in a circle in order to get a customised feed for them; right now, it's all or nothing.

Git Tip of the Week: Tracking Branches

References

This week's Git Tip of the Week is about configuring what happens when you pull. You can subscribe to the feed if you want to receive new instalments automatically.


Last week I wrote about the behaviour of pulling tracked branches; this week, it's worth taking a dive in to find out what a tracked branch is.

When you initially use Git, you learn that to update items from master involves a git pull (or git fetch). Both of these reach out to the remote repository and get content that you're interested in, with the git pull variant doing either a merge or a rebase as appropriate.

But how does Git know what to pull when you invoke git pull? Where should it pull it from? What makes a branch you have checked out locally differ from one you have pulled from a remote repository?

Remotes and Refspecs

Firstly, a (local) git repository can have many remotes. Each remote is a name of a repository on a remote end, which corresponds to a URL and a refspec. (In fact, remotes can have a second URL; one is used for fetching, whilst the other is used for pushing – this is used to permit anonymous fetches but authenticated pushes.) You need to specify, when fetching and pulling, what repository you're talking about. For remote repositories, this will default to origin if not specified

You can specify what the refspec is when interacting with a remote repository. This is the set of branches that will be updated if you interact with that repository. This is normally of the form refs/heads/master:refs/remotes/origin/master, where refs/heads are the pointers to your local branches, and refs/remotes are the remote branches.

An optional + prefix on fetch refspecs indicates whether or not to fetch non fast-forward commits automatically. And whilst you can't have partial wildcards (like refs/for/qa*) you can have sub paths (like refs/for/qa/*). You can also use the reference HEAD to refer to the commit that the current branch is on as a source for the refspec, which can be useful for pushes.

Tracked branches

However, each branch also has the concept of what it is tracking. As well as the branche(es) that will be affected by a fetch/pull/push, tracking says which branch is upstream of which.

Normally, branches checked out of a remote repository are automatically set up as tracking branches. If you check out EGit, you'll get a master branch that tracks refs/remotes/egit/master (or origin if you didn't specify a default repository identifier). Any changes you pull into your master come (by default) from EGit's master.

However, what if you wanted to spin off another branch for experimental purposes, and keep that updated? If you do git checkout -b experimental, it diverges from your local master at that point in time. You either need to pull changes through master and then rebase, or remember where your merge point was.

Instead, you can set up your experimental branch to track another one. This means you can fetch and pull, as if you were pulling from a remote repository, and consume changes from the ongoing branch moves. This is useful if you have a long-running UAT branch which needs to be refreshed periodically from a moving target; setting it up as a tracked branch means that the only thing you need to do is git pull, and you're up to date.

So, how do you set up a branch for tracking? Well, when you check out a branch from a remote master, it gets set up automatically. In fact, all a tracked branch is is one that's explicitly mentioned in the .git/config file, since it lists what its remote is and where to merge from:


$ git clone upstream clone
Cloning into clone...
done.
$ cd clone
$ tail .git/config
[branch "master"]
	remote = origin
	merge = refs/heads/master

The way to read this is that master is a local branch, which tracks refs/heads/master on the remote origin. Any pulls that happen on master will result in a merge (or rebase) from refs/heads/master.

What if we wanted to set up our experimental branch? If we just do git checkout -b experimental, it won't be tracked:


$ git checkout -b experimental
Switched to a new branch 'experimental'
$ grep branch .git/config 
[branch "master"]

We can flag it as tracked using the --track option of git checkout (or its shorter -t alias):


$ git checkout master
Switched to branch 'master'
$ git branch -d experimental
Deleted branch experimental (was 4a3fa88).
$ git checkout --track -b experimental
$ tail -3 .git/config 
[branch "experimental"]
	remote = .
	merge = refs/heads/master
$ git pull
From .
 * branch            master     -> FETCH_HEAD
Already up-to-date.

Hang on, what's the remote = . doing in here? Well, that's a special short hand meaning this repository, much like it means this directory in filesystem access. What we have here is experimental tracking master, and not origin/master; in other words, it's a local branch tracking another local branch. There are times when this is useful, but what if you want to track the remote one directly instead of having to pull through a local copy?


$ git checkout master
Switched to branch 'master'
$ git branch -d experimental
Deleted branch experimental (was 4a3fa88).
$ git checkout -b experimental origin/master
Branch experimental set up to track remote branch master from origin.
Switched to a new branch 'experimental'
$ tail -3 .git/config
[branch "experimental"]
	remote = origin
	merge = refs/heads/master

Now we have a branch experimental which is tracking origin/master. When we have an update in the upstream repository, and do a pull, we see it updating both master, and also experimental:


$ git pull
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 2 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (2/2), done.
From upstream
   4a3fa88..55eb534  master     -> origin/master
Updating 4a3fa88..55eb534
Fast-forward
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 third

The update shows origin/master being updated to the new value. The subsequent step is the updating and fast-forward of the local experimental branch. But what of the local master branch?


$ git log --oneline experimental
55eb534 Third
4a3fa88 Second
ff8536c Start
$ git log --oneline master
4a3fa88 Second
ff8536c Start

So although both experimental and master are tracking the same upstream branch, they can be updated and processed independently. This is useful when you want to advance the state of one branch (perhaps for experimentation purposes) but don't want to change the local state of a branch.

It's possible to add upstream tracking information to an existing local branch after the fact, in recent versions of git. If we'd checked out the experimental branch as in the first step, and didn't want to delete/re-create it (perhaps because we'd made some local changes) then you can add it afterwards:


$ git checkout master
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
$ git checkout -b experimental2
$ git checkout -b experimental2
Switched to a new branch 'experimental2'
$ git branch --set-upstream experimental2 origin/master
Branch experimental2 set up to track remote branch master from origin.
$ tail -3 .git/config 
[branch "experimental2"]
	remote = origin
	merge = refs/heads/master

So even if you have existing branches, it's possible to wire them up to be tracking branches after the fact. You can also use this if you want to change which branch you're tracking (say, swapping a local branch for a remote one or vice versa) by re-running the command.

It's also worth mentioning that there is a --no-track option of git checkout, which can be used to prevent the tracking of branches upon checkout if that's desired. This is sometimes useful if you are consuming a feature or bugfix branch and you don't want/need to pull from it in the future.

Lastly, all of this is configured with the branch.autosetupmerge config option. If this option is false, then branches are never tracked by default. If the option is true, then branches are tracked if they are remote, and not tracked if they are local. If the option is always, then branches are always set up as tracked branches, regardless of whether they are local or remote. These effectively specify the defaults, but they can be overridden on a branch-by-branch basis using the --no-track and --track command line flags of the git checkout or git branch commands.


Come back next week for another instalment in the Git Tip of the Week series.