Tuesday, October 25, 2011

Git Tip of the Week: Merging Revisited

References

This week's Git Tip of the Week is about merging. You can subscribe to the feed if you want to receive new instalments automatically.


We previously looked at merging back in April, but we touched on the the index last week, and the index (or stage) numbers, which play a part in merges. This week, we'll look at what that means with different kinds of merges. If you're not familliar with merges, take a look at the previous post first.

Merging allows you to bring together two (or more) commits into a single merge commit, whilst at the same time bringing together all of the different files combined in the two. If there are merge conflicts, these need to be resolved individually. Merges where one of the commits is an ancestor of the current commit are referred to as fast-forward merges, and do not result in a merge commit being created. When you pull from a remote source, you're actually doing a fetch and a merge in one step – though usually, you can just fast-forward on when you do.

For the purposes of this post, I'll create three branches called ‘Alex’, ‘Bob’ and ‘master’, in homage to Alice and Bob. Each branch will have a file called Hello which will contain the text “Hello, my name is Alex” (c.f. Bob) as well as an AlexsFile, BobsFile etc.


(master) $ git ls-tree Alex
100644 blob ada4c4c4f33cd190fe40769d5ca9826adb9fb7ce	AlexsFile
100644 blob ca4eef2f4e3f1fe92028176cb547b590a08c2259	Hello
(master) $ git ls-tree Bob
100644 blob eea826732acee08a8cf83445e3b98cf58f11ce5c	BobsFile
100644 blob ca4eef2f4e3f1fe92028176cb547b590a08c2259	Hello
(master) $ git ls-tree master
100644 blob ca4eef2f4e3f1fe92028176cb547b590a08c2259	Hello

Strategies

When you create a merge, you have the option of saying how the merge will be processed. The usual strategy is recursive. This means Git will walk each directory (tree) and find out which files have differences compared to the base revision, and then use the one with changes. (If both have changes, the new file's contents are merged textually, and if there's a problem with that, a conflict ensues.) This is why you see the message “Merge made by recursive” after you do the operation:


(master) $ git checkout Alex
Switched to branch 'Alex'
(Alex) $ git merge Bob
Merge made by recursive.
 BobsFile |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 BobsFile
(Alex) $ git log --oneline --graph
*   d612aab Merge branch 'Bob' into Alex
|\  
| * 8afb2d3 Bob's File
* | 4abf59e Alex's File
|/  
* 5cba624 Hello
(Alex) $ git show d612aa
commit d612aab9858289ed027230d3b9a7b2a7a5e75945
Merge: 4abf59e 8afb2d3
…
    Merge branch 'Bob' into Alex

The commit d612aa… is a merge node in that it brings two separate branches together. Since neither is an ancestor of the other, they can't be fast-forwarded, and as such, the merge node is created. We can determine the commits' parents with HEAD^1 and HEAD^2:


(Alex) $ git rev-parse HEAD^1
4abf59ef73c186e93db25e8b7bc4423fbd11bbd0
(Alex) $ git rev-parse HEAD^2
8afb2d368ce26ca71cec539c31400c7001a18efc

It's even easier when there are no changes to be merged:


(Alex) $ git checkout master
Switched to branch 'master'
(master) $ git merge Bob
Updating 5cba624..8afb2d3
Fast-forward
 BobsFile |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 BobsFile

The fast-forward here indicates that master is behind Bob, and so we can simply move the pointer forwards – as a result, we don't need to create a merge node.

So, two strategies that Git uses are to do a fast-forward or merge made by recursive. This covers 99% of the merges that you'll need to do; but it's worth noting that Git has a few more tricks up it's sleeve which can help in certain situations.

Octopus Merge

The examples above all have one or two parents. However, a Git merge node is capable of representing more than two heads in a merge, and it uses a strategy called octopus. This is selected by default when you merge more than two branches:


(master) $ git reset --hard 5cba624b94a7a622183c960697867c8bba73aa91
HEAD is now at 5cba624 Hello
(master) $ date > NewFile
(master) $ git add NewFile
(master) $ git commit -m "New File"
[master 598ad85] New File
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 NewFile
(master) $ git merge Alex Bob
Trying simple merge with Alex
Trying simple merge with Bob
Merge made by octopus.
 AlexsFile |    1 +
 BobsFile  |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)
 create mode 100644 AlexsFile
 create mode 100644 BobsFile
(master) $ git log --oneline --graph
*-.   5a2aa0d Merge branches 'Alex' and 'Bob'
|\ \  
| | * 8afb2d3 Bob's File
| * | 4abf59e Alex's File
| |/  
* | 598ad85 New File
|/  
* 5cba624 Hello
(master) $ git show
commit 5a2aa0da3d3b3365703d710dad8aeebc0770b8ef
Merge: 598ad85 4abf59e 8afb2d3
…
    Merge branches 'Alex' and 'Bob'
(master) $ git rev-parse HEAD^1 HEAD^2 HEAD^3
598ad850114e1f7445ee8b02e93ee23060439560
4abf59ef73c186e93db25e8b7bc4423fbd11bbd0
8afb2d368ce26ca71cec539c31400c7001a18efc

In this case, we have three commits merging together into the merge node; the master (which had diverged), and the Alex and Bob branches from before. Since our merge node now has three parents, we can use git rev-parse to convert the HEAD^1/2/3 into the first, second and third heads.

What if the files are conflicted?


(master) $ git checkout Alex
Switched to branch 'Alex'
(Alex) $ echo Hello, my name is Alex > Hello 
(Alex) $ git commit -a -m "My name is Alex"
[Alex c2cb955] My name is Alex
 1 files changed, 1 insertions(+), 1 deletions(-)
(Alex) $ git checkout Bob
Switched to branch 'Bob'
(Bob) $ echo Hello, my name is Bob > Hello  
(Bob) $ git commit -a -m "My name is Bob"
[Bob 7cb6225] My name is Bob
 1 files changed, 1 insertions(+), 1 deletions(-)
(Bob) $ git checkout master
(master) $ git merge Alex Bob
Trying simple merge with Alex
Trying simple merge with Bob
Simple merge did not work, trying automatic merge.
Auto-merging Hello
ERROR: content conflict in Hello
fatal: merge program failed
Automatic merge failed; fix conflicts and then commit the result.
(master|MERGING) $ git ls-files --stage
100644 ada4c4c4f33cd190fe40769d5ca9826adb9fb7ce 0	AlexsFile
100644 eea826732acee08a8cf83445e3b98cf58f11ce5c 0	BobsFile
100644 ca4eef2f4e3f1fe92028176cb547b590a08c2259 1	Hello
100644 a5a820416bae2c7b77340e5b2120aab9595d2bfc 2	Hello
100644 98b16693fe64acb9d002af1fe5f5162d58bd40b4 3	Hello
100644 09d774502f97ba9a46f25f8f11601b653c376828 0	NewFile
(master|MERGING) $ 

We can launch a three-way diff tool with git mergetool:


(master|MERGING) $ git mergetool
(master|MERGING) $ git mergetool
merge tool candidates: opendiff kdiff3 tkdiff xxdiff meld tortoisemerge gvimdiff diffuse ecmerge p4merge araxis bc3 emerge vimdiff
Merging:
Hello

Normal merge conflict for 'Hello':
  {local}: modified file
  {remote}: modified file
Hit return to start merge resolution tool (opendiff): 
…
(master|MERGING) $ git commit -a -m "Merged"
[master 999a938] Merged

Note that the Octopus merge does not handle conflicts of more than 2 files. If you do, you end up with a different error mesage:


(master) $ echo Hello World > Hello 
(master) $ git commit -a -m "Hello World"
[master fe79e59] Hello World
 1 files changed, 1 insertions(+), 1 deletions(-)
(master) $ git merge Alex Bob
Trying simple merge with Alex
Simple merge did not work, trying automatic merge.
Auto-merging Hello
ERROR: content conflict in Hello
fatal: merge program failed
Automated merge did not work.
Should not be doing an Octopus.
Merge with strategy octopus failed.

Ours

Finally, the last strategy that's worth knowing about is the ours strategy. This takes any number of heads, and creates a merge node, but without actually doing any changes. In other words, a git diff HEAD^1 will always return empty for an ours strategy:


(master) $ git diff HEAD^1
(master) $ git log --oneline --graph --decorate
*-.   c5f84cc (HEAD, master) Merge branches 'Alex' and 'Bob'
|\ \  
| | * 7cb6225 (Bob) My name is Bob
| | * 8afb2d3 Bob's File
| * | c2cb955 (Alex) My name is Alex
| * | 4abf59e Alex's File
| |/  
* | 598ad85 New File
|/  
* 5cba624 Hello
(master) $ git rev-parse HEAD^{tree}
78784bb4dea678c157d8711bc56c5478a74588c3
(master) $ git rev-parse HEAD^1^{tree}
78784bb4dea678c157d8711bc56c5478a74588c3

We can verify that these are identical, since tree pointed to by HEAD is the same as the tree pointed to by HEAD^1 (i.e. the parent). The suffix ^{tree} is used to show the tree associated with the commit.

The --decorate argument to git log adds the branch names to the output, which can be useful in showing where merges have come from. The --graph argument is mostly of use with the --oneline argument; although you can run a git --graph the full commit messages tend to hide the structure of the graph.

The ours strategy is only really useful if you want to encode a set of previous commits but not have them affect the current master (say, because you've cherry picked some of the contents and don't want to take other parts, but preserve them in the history as-is somehow).

Merge Message and Fast Forwards

The merge message will be created automatically, based on the names of the branches you are merging. However, it's possible to pass a -m option, like with git commit, to supply an additional message. This can be useful if the merge message needs additional information encoded (such as which bug(s) were fixed).

It's also possible to force a merge, even if one isn't necessary. If you have topic-based branches, it can be useful to denote that the work was carried out on a separate branch before being merged back into master. Running git merge --no-ff will create a merge node, whether or not the branch can be fast-forwarded it or not. Since the merge commit has the branch name you're merging from as part of the commit, you can end up with descriptive names to show the feature having been completed:


(master) $ git checkout -b "bug12345"
Switched to a new branch 'bug12345'
(bug12345) $ echo BugFix >> Hello 
(bug12345) $ git commit -a -m "Fixing bug 12345"
[bug12345 e2bd64e] Fixing bug 12345
 1 files changed, 2 insertions(+), 0 deletions(-)
(bug12345) $ git checkout master
Switched to branch 'master'
(master) $ git merge bug12345 # without --no-ff
Updating 999a938..e2bd64e
Fast-forward
 Hello |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)
(master) $ git reset --hard HEAD^
HEAD is now at 999a938 Merged
(master) $ git merge --no-ff bug12345 # with --no-ff
Merge made by recursive.
 Hello |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)
(master) $ git log --oneline --graph --decorate
*   4a905c8 (HEAD, master) Merge branch 'bug12345'
|\  
| * e2bd64e (bug12345) Fixing bug 12345
|/  
*-.   999a938 Merged

Being able to merge with a --no-ff can be useful when using some kinds of commit workflows, where many branches are used to develop individual features and then brought into a master branch subsequently. It's also worth noting the --merges argument allows you to filter just the merges in a repository:


(master) apple[merge] $ git log --oneline --merges --decorate
4a905c8 (HEAD, master) Merge branch 'bug12345'
999a938 Merged

Next time, we'll look at what we can do with the different Git workflows using merge --no-ff.


Come back next week for another instalment in the Git Tip of the Week series.

Tuesday, October 18, 2011

Buzz Buzzes Off

References

As announced last week, Google have finally done the inevitable, shutting down Google Buzz. Good riddance, says I. When it launched, I was sorely unimpressed, both with the style of its launched (remember how it advertised every contact you ever mailed?) and of the quality of the implementation itself.

For example, Google Buzz sent me a mail message whenever I posted something on it. Good grief – the fact that I've posted it should be enough of a clue that perhaps I knew what I said? I, like many others, ended up setting e-mail rules just to delete any and all mail that came from Buzz that had my name in it, just so I wouldn't get notified of my own buzzes.

When it launched, the API was pretty sparse (and hey, look – they repeated the same mistake that was Google+). Unfortunately, whilst Google may know a lot about systems they know nothing of platforms, and Buzz was always sitting in a wasteland whilst others, such as Facebook and Twitter, took off.

Google Buzz 2.0: Also known as Google+, though slightly less googleable. I have absolutely no idea why they ran these for different periods of time. It would be like Apple continuing to run iTools after MobileMe was announced, or continuing to run MobileMe after iCloud was announced. Apple has a switchover date – with a period of testing for developers and other interested users – but when the Big Switch happens, it's on with the new and off with the old. Yes, things can go wrong (Exhibit A: Mobile Mess) but at least you know there's a transition, and the data is migrated automatically.

The real question is why Google didn't kill off Buzz earlier. Perhaps it's a case of different teams fighting for fiefdom in the Google realm, or maybe it's just that the Buzz team turned into the Plus team and didn't think about turning the lights off as they left the room.

Google selcriC

The real problem, of course, is that Google+ is exactly the same as Google Buzz. Yes, there's a snazzier UI for letting you follow, but as with politics, it's out with the old and in with the same old new. The key issue is that it is exactly backwards from what it needs to be. Even the name suggests this to be the case; you follow someone, you don't ask them to follow you. But that's exactly the UI that you have in Google+, other than the Public scope.

I have a lot of different interests. I hack code on the OSX Kernel (specifically ZFS), I write a lot about Git, I am an advocate of OSGi and occasionally I a have been known to go flying.

The chance of someone being interested in this exact set (or a superset, let's be fair) is pretty close to zero. If I were to publish everything on Insert Social Network Here, I would kill off many followers that I have. I'm guessing that a fair proportion of people who follow my blog pick up on one of the specific feeds – such as the gtotw or eclipse or osgi feeds.

What Circles really needs is a way of saying “Here are a set of things that I write about; add yourself to them” That way, if someone wants to learn more about Git but doesn't want to hear about OSGi, then they can do so. I can then target posts I make to that group to specifically address the Git posts, instead of the wide ranging ones.

That doesn't diminish the need for a Public group, nor does it stop the requirement for having private groups, though the private ones can accidentally go public, as Steve Yegge recently found.

Google+ Tags

What we really need is an upgrade to Google to support arbitrary tags. These can be the names of existing circles, or yet-to-be-created circles. A circle can either be public or private; in the former case, anyone can opt in/out but in the latter case, you have control over who will opt in.

When I post an item, I just give it a list of tags. I may choose to use ZFS or Eclipse or OSGi, and followers of those circles will see that post. If the post isn't also tagged as public, then only those will see it.

I can thus manage my groups to the set of private circles I want (i.e. mostly 'Friends' and 'Family) and any other circle gets automatically created and membership managed by those who want to receive notifications.

Unfortunately, it looks like too much emphasis was given to UI and not enough to usability. There is an entirely different aspect between UI and usability which seems to have been missed by the designers of the Google+ infrastructure. For example, when I post something, it takes a good few seconds (depending on network connection speed) from clicking on the 'Post an update' button to actually writing something. Why it needs to do any kind of network connectivity at this point is beyond me. There should be a way of putting flat tags in, with perhaps a background notification indicating if a tag represented a new circle or not – but it shouldn't let a post be prohibited just because it needs to find a list.

Finally: Notifications

The notifications in Google+ drive me batshit insane. The red box number at the top of the screen has become a pointless red box – regardless of whether there's something interesting to see or not, it always highlights a non-zero number. This is often because people have started to follow me, but there is absolutely no setting which says 'STFU' and only let me focus on the things that I want to see. If anyone knows how to get rid of that damn red box, please leave a message in the comments. To quote an anonymous Twitter comment, “Google+ has gone from a way of sharing information to a way of being annoyed by notifications”

Git Tip of the Week: Index Revisited

References

This week's Git Tip of the Week is about indexes. You can subscribe to the feed if you want to receive new instalments automatically.


In last week's tip we visited the purpose of the index. But what actually is it?

It's not actually a tree object, as I alluded to last time. That is, you can't iterate the contents with git ls-tree. It does point to blobs in the object database, however. So why do we need a different type of object to refer to the index?

Some of the reasons are performance oriented. Whenever you do a diff (or other repository-wide operation), Git needs to quickly and efficiently compute whether the state of the working tree has changed since the last index. Some tools, including the bash shell prompt, need to be able to determine if the working tree is dirty or not quickly:


(master) $ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>…" to update what will be committed)
#   (use "git checkout -- <file>…" to discard changes in working directory)
#
#	modified:   example
#
no changes added to commit (use "git add" and/or "git commit -a")
(master) $ export GIT_PS1_SHOWDIRTYSTATE=true
(master *) $ 

If Git only knows whether the tree is dirty by doing a full walk of the contents (and calculating their SHA1 hashes), this operation would be prohibitively expensive. Fortunately, Git has a number of optimisations that allow it to avoid this case.

The index stores not only the file names, but also the last modification time of those files. As a result, Git knows whether there have been any changes to timestamps, by iterating through the files' metadata and comparing the timestamps with those in the index. If a file is missing from the index, it's represented as an addition. If a file is missing from disk, it's represented as a deletion. If a file's modification time is different then this is represented as a modification.

As well as storing the timestamps, the index also stores the SHA1 hashes of each blob. This allows the index to update itself, should the file be reverted to a previous state but with a later timestamp.

Finally, the index is also used for processing merges. In the index, there is a concept of having multiple index numbers (or stage numbers). Normally, only 0 is used since this represents the state of the current working tree. However, if a merge conflict arises, then the index is used to disambiguate the state of the files at each level. If you have a conflict, then stage 0 is used to represent the current working tree, stage 1 is used for your change, then stage 2 and 3 for the other differences. You can see the stage number by running git ls-files --stage (or -s):


(master) $ git status
(master) $ git ls-files -s
100644 ce013625030ba8dba906f756967f9e9ca394464a 0	example
(master) $ git pull # with known conflict
Auto-merging example
CONFLICT (content): Merge conflict in example
Automatic merge failed; fix conflicts and then commit the result.
(master|MERGING) $ git ls-files -s
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 1	example
100644 a895c0db0a627cb9451ae390a2a0922495dbb161 2	example
100644 13940d48c3d3693113f7543d4fc5423a916ef55d 3	example

The stage 1 file here is the place the two files derived from, so we can do 1..2 and 1..3 diffs to find out what each side of the tree has changed since. (The more astute of you will recognise e69…391 as the empty file.) We can show the contents of these versions of the files, or load them into a 3-way diff tool if you have such a thing:


(master) $ git show e69de #empty file
(master) $ git show a895c 
Left tree
(master) $ git show 13940
Right tree

Of course, normally Git will handle the diffs for you and you don't need to worry about the specific changes, nor extracting the contents out of the index. But it does highlight the fact that when you have finished doing a merge of a single file, running git add on the file puts a copy in stage 0 of the index, removing the 1,2,3 indexes:


(master|MERGING) $ git add example
(master|MERGING) $ git ls-files -s
100644 319c128291474d30f48e721ca87bd10425e8e296 0	example

This is why merging large conflicting changes with Git is easy. Each file can be addressed on a file-by-file basis; when you have finished merging a file, you can add it to the index, which records both its contents as well as removing the other files from the merge status. Merging many files then becomes an exercise in merging them one-by-one, and adding them as you go. And since they're all transiently stored in the index, you can keep adding them until you are ready to perform a git commit.


Come back next week for another instalment in the Git Tip of the Week series.

Tuesday, October 11, 2011

Git Tip of the Week: Understanding the Index

References

This week's Git Tip of the Week is about indexes. You can subscribe to the feed if you want to receive new instalments automatically.


In the previous Git Tip of the Week, we looked at interactive adding; that is, the ability to add just parts of a file instead of the whole file as part of the commit.

It's about time we focussed on the staging area of Git, which we implicitly used when we added parts of the file last time. This is one of the main differences between Git and other version control systems, and often people are confused about its purpose.

The staging area of Git allows you to freeze a state of your working tree, such that the subsequent git commit takes that frozen state and uses it as the point of contact. Many version control systems only let you freeze at the point of commit and so don't have this intermediary stage; and when you are using Git for the first time, you will often have a habit of using the git add to be immediately followed by a git commit, or even setting up an alias to do everything.

So why does Git have the concept of an index, anyway? Well, remember that Git uses content-addressable files; in other words, when you have a specific piece of content (like the empty file) it always has the same identity – e69..391 – whatever the file is called. What happens when you run git add is that the object is added to the object database. As well as the object being added, it needs a pointer to point to it, so there's a virtual tree called the index which contains a tree, which points to the blobs contained therein.

When you add files (or rm files), you really end up modifying this tree which represents what you'd like to do next. When you run commit, it takes that tree, builds a valid tree object, and commits that to the database (as well as updating the branch, if any).

Although it may not immediately seem useful to have this feature (and some argue that this is an example of Git's complexity over other systems), it can be very beneficial for doing specific operations; for example:

  • Staging parts of a file to break it up into different commits (as last time)
  • Working with large merges, where many files may have conflicts (you can record which ones don't have conflicts, and ones that you've already worked with, but running git add; you're then left with a decreasing number of differences to process)

When you run git status, it tells you all you need to know about how the index corresponds to your current working tree, giving you different messages about the files that it finds. To speed up processing, Git usually uses timestamps to determine if a file has been changed, but in doing a full processing sweep will calculate the SHA1 hash of the contents of the files (and thus, the directories) to determine differences against the index:


(master) $ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	deleted:    deleteme
#	renamed:    same -> renamed
#
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   changed
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	addme

The contents of the index are really the snapshot of the files that have been modified. For example, the renamed and deleted above show changes which have been staged (i.e. added to the index) whilst the not staged changes have been modified but are not yet committed. The index also allows for quick identification of changes in the local repository which have yet to be added.

So, for a given file, there are possibly three separate copies whilst working on it. There's the previous version that was committed (i.e. HEAD), there's the current version on disk (the working tree) and a third copy, which is a cached version in the index. That's why, when you have a local change combined with one that already exists you might see the same file twice in the status message:


(master) apple[example] $ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	modified:   three
#
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   three
#

You can use git diff to show you the differences between the two files:


(master) $ git diff
diff --git a/three b/three
index bb87574..3179466 100644
--- a/three
+++ b/three
@@ -1 +1 @@
-This is the version in the index
+This is the version in the working tree
(master) $ git diff --cached
diff --git a/three b/three
index 48d0444..bb87574 100644
--- a/three
+++ b/three
@@ -1 +1 @@
-This is the previously committed version
+This is the version in the index

In the second example, the --cached says to compare between the index and the previous commit (otherwise it's comparing the index and the working tree). You can, if you want, get the full contents of each of those files as long as you know the hashes (shown above in the diffs):


(master) $ git show 48d0444
This is the previously committed version
(master) $ git show bb87574
This is the version in the index
(master) $ cat three
This is the version in the working tree

The last one, of course, doesn't have a Git object yet by virtue of the fact we've not added it yet. If we were to add it, we'd replace the previous version in the index. We'll take a deeper dive into the index next time.


Come back next week for another instalment in the Git Tip of the Week series.

Thursday, October 06, 2011

One Last Thing

References

As many of you have no doubt by now read, Steve Jobs passed away on 5th October 2011. As a mark of respect, the landing page for Apple has been replaced with a simple iconic image.

Steve has been in my technical career most of my adult life. The first computer I bought was a Nextstation pizza box, where I was introduced to Nextstep 3, and my first real coding language was Objective-C. (When Java came out, three years afterwards, being able to pick up object orientation was trivial since the ground work had already been done.) Although I was never an 'original' Apple user, I did move over and pick up my first PowerBook G4 running at a steaming 500MHz to run OSX 10.0, and I never looked back. These days, I'm still sitting in front of the same OS – with cosmetic changes, of course – and I'm carrying around an always-on device, running the same OS and far more powerful than my original Nextstation.

Back when Steve retired from Apple, it was clear that it was the beginning of the end. Although he kept going a couple of months, the downward slide had begun and his retirement didn't last long. At least he lived long enough to see Tim Cook and friends launch the iPhone 4S, which would have been a distinctly different affair had events transpired the other way around.

However, we're all going to die sometime. The point of life is to enjoy and achieve greatness in the brief period of time we have on this world. Without a doubt Steve changed the world for the better. (The original packaging of my Nextstation has the line “At NeXT we believe a few people can change the world”, and I know of no better tagline that can describe Steve's outlook.) Quite apart from being a showman and lead presenter for Apple's product announcements and presentations, his vision resulted in significant changes in the way that technical and entertainment industries operate today. Here are just some of them:

  • Original Apple PC. Back before PC was called PC, Apple arrived on the desktop computer scene. IBM PCs would later join in the fray, and eventually overtake them, but Apple stood out as part of that initial phase from home computers to work comptuers.
  • Nextstep operating system. Five years before Linux 1.0 happened, Nextstep already had a fully featured operating system, and could play videos, record sound and embed objects. In comparison, Windows hadn't even reached version 3 by the time of Nextstep's debut.
  • NeXT hardware. Although it didn't catch on commercially, the systems had high-quality defaults. Ethernet as standard, 24-bit colour, 16-bit sound, built in audio mic and speakers (and this in the day when SoundBlaster was a recently launched product rather than a generic API). True, it was overpriced for what it was – but the first version of HTTP was written on a NeXT box.
  • iMac and general iLine. Bringing technology to the masses, rather than to the geeks. The all-in-one form factor of the iMac remains to this day (although the factor has shifted with the transition from CRTs to flat screens).
  • Aggressive phase-out of old technology and support of new technology. Floppy drives, PS/2 connectors (or ADB connectors) all went away with the advent of the iMac. USB and FireWire were their replacements, which still exist today. These days, with the MacBook Air even the DVD has gone, as has minor inconveniences like the removable battery.
  • Computer Animated Movies. Pixar significantly altered what was possible with the release of Toy Story, which wouldn't have been possible without Renderman, a Nextstep based graphical animation tool. And of course, Steve's involvement with Pixar helped it to go on to many further movies, including an involvement with Disney for mass adoption.
  • Music. Every MP3 player is generically called an iPod, even if it's not an Apple system. They weren't the first (“No wireless. Less space than a nomad. Lame.”) but they did make it easy to get music on a portable device. The original one only had space for 1,000 songs – which most people rarely exceed their regular playlists in any case – but it went on to define a generation of devices. In particular, the success of the iPod led to the creation of the iTunes store and then, an opportunity for the music cartel to get a profit from buying songs instead of pirating them through the likes of Napster. Since then, Amazon have gone on to create a successful MP3 store and DRM has gradually been removed, but without Apple's bargaining power with the media cartels this wouldn't have happened.
  • Movies. Similar with music, although we've not seen as many copy-cat stores to date. But digital media consumption is clearly the path of the future and Apple is helping to lead the way.
  • Smart touch-based phones. Again, Apple wasn't the first – the Sony P800 probably was, though it had a flip-over cover if you wanted 'real buttons' for calls and the like. As with other devices they were stylus rather than finger based. But when the iPhone launched (and was derided by competitors such as Nokia and RIM, who now are far back in the shadow) it changed the face of smart phones overnight. Instead of having hundreds of different models, all with slightly incompatible operating systems and bugs which were never fixed, Apple pioneered the device with a user-updatable OS and no carrier lock-in. Since then, all of the smart phones have been based on the original design of the iPhone.
  • Tablet computers. The MacBook Air was the first tablet-esque computer, with solid state memory, a non-removable battery – although that still had a keyboard and a fully-featured operating system on board. (These days, the Air is the Pro without a DVD though it remains to be seen how long the DVD will last in the Pro line up.)
  • Tablet devices. As if the Air wasn't enough, the launch of the iPad – based on the iPhone (and iPod Touch) but with a bigger screen estate – again changed the tech world overnight. New tablet devices are always marketed as the 'iPad killer' in the same way that smart phones are always 'iPhone killers', but in truth, none has really come close.

It's difficult to think of other ways that Steve changed the world. A less tangible, but still none the less important, is the focus on putting the user, rather than the computer, first. All of the Apple products are designed around making it easy for the end user to operate and use, regardless of the programming trickery or delays in getting it Just Right. Whole new conferences on UX exist, and many of them are inspired by Steve's attention to detail and getting it right.

Whatever your personal thoughts are of Apple (and there's usually those at each ends of the spectrum), Steve Jobs certainly caused a major impact over his lifetime and changed the destiny of millions, perhaps billions, of people. His legacy will live on. And, at the end, all of us would like to go knowing that we had made a positive difference on the world.

Tuesday, October 04, 2011

Git Tip of the Week: Interactive Adding

References

This week's Git Tip of the Week is about interactive adding. You can subscribe to the feed if you want to receive new instalments automatically.


This week's tip of the week is about interactive adding. Up until now, whenever you've added a file with git add, it has taken the entire file and added that into the index.

However, it is possible to add only parts of a file to the index; in Git terminology, these are called hunks. A hunk is merely a set of changes to a file, involving lines added (those prefixed with +) and lines removed (those prefixed with -). As well as being a way of textually showing the differences, when it comes to packfiles (which we've covered previously), it uses these hunks to store multiple versions of the same content whilst using significantly less space.

There is an interactive menu that can be brought up with git add --interactive, or git add -i for short. However, most of the options here are not very useful; the one that you'll find yourself using most frequently is the [p]atch command. There is a shorter way of solving this, with the git add --patch command, or just git add -p.

What does this do? Well, it allows you to selectively choose which diffs get added into the index. Although this might not sound particularly useful, it lets you break down changes to one file into multiple smaller changes, provided that you commit them each time. Let's look at an example, with a Person class that we're adding a first name and last name to:


(master) $ git show Person.java | tail -4
@@ -0,0 +1,3 @@
+public class Person {
+
+}
(master) $ git diff
@@ -1,2 +1,17 @@
 public class Person {
+  private String firstName;
+  public String getFirstName() {
+    return firstName;
+  }
+  public void setFirstName(String firstName) {
+    this.firstName = firstName;
+  }

+  private String lastName;
+  public String getLastName() {
+    return lastName;
+  }
+  public void setLastName(String lastName) {
+    this.lastName = lastName;
+  }
 }

Normally, if we do git add, it will put the change for both the firstName and lastName into the index. What if we wanted to segregate these out? Well, we could just take a copy of the file, edit out one change, add it, copy the file back, and then add the second change. But we can use Git to help us here with git add -p:


@@ -1,2 +1,17 @@
 public class Person {
+  private String firstName;
+  public String getFirstName() {
+    return firstName;
+  }
+  public void setFirstName(String firstName) {
+    this.firstName = firstName;
+  }

+  private String lastName;
+  public String getLastName() {
+    return lastName;
+  }
+  public void setLastName(String lastName) {
+    this.lastName = lastName;
+  }
 }
Stage this hunk [y,n,q,a,d,/,e,?]? 

The 'stage this hunk' is asking us what we want to do. If we wanted to add it, we'd say 'y' or 'a'. If we didn't, we could say 'n' or 'd'. (The former is 'just this one' whilst the latter is 'and all the rest'.)

What if we wanted to add it piecemeal? Well, there's a [s]plit command we can use, which breaks this hunk down into smaller hunks:


Stage this hunk [y,n,q,a,d,/,e,?]? s
Split into 2 hunks.
@@ -1,2 +1,9 @@
 public class Person {
+  private String firstName;
+  public String getFirstName() {
+    return firstName;
+  }
+  public void setFirstName(String firstName) {
+    this.firstName = firstName;
+  }
Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? y

@@ -2,2 +9,9 @@
 
+  private String lastName;
+  public String getLastName() {
+    return lastName;
+  }
+  public void setLastName(String lastName) {
+    this.lastName = lastName;
+  }
 }
Stage this hunk [y,n,q,a,d,/,K,g,e,?]? n
 

What we've done is to add the first 7 lines of the change into the git index, whilst leaving the last 7 outside. If we do a git diff, we'll just see the unstaged difference:


(master) $ git diff
diff --git a/Person.java b/Person.java
index a6d00fd..23dd325 100644
--- a/Person.java
+++ b/Person.java
@@ -7,4 +7,11 @@ public class Person {
     this.firstName = firstName;
   }
 
+  private String lastName;
+  public String getLastName() {
+    return lastName;
+  }
+  public void setLastName(String lastName) {
+    this.lastName = lastName;
+  }
 }

We can see in the Git status that we have a staged change and an unstaged change:


(master) $ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD ..." to unstage)
#
#	modified:   Person.java
#
# Changes not staged for commit:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#
#	modified:   Person.java
#

Normally you'd be concerned if you saw this – as if you had forgotten to add something. However, it's exactly the right effect in this case; we've added part of our change, and we have part of the change still to go. From here, we can commit as usual:


(master) $ git commit -m "Added firstname"
[master 2e0d5f8] Added firstname
 1 files changed, 7 insertions(+), 0 deletions(-)
(master) $ git add Person.java
(master) $ git commit -m "Added lastname"
[master 1d09387] Added lastname
 1 files changed, 7 insertions(+), 0 deletions(-)

If we look at the blame for the file, we can see we've committed the changes as separate commits:


(master) $ git blame Person.java | cut -c 1-9,50-80
391f64ef  1) public class Person {
2e0d5f8b  2)   private String firstName;
2e0d5f8b  3)   public String getFirstNam
2e0d5f8b  4)     return firstName;
2e0d5f8b  5)   }
2e0d5f8b  6)   public void setFirstName(
2e0d5f8b  7)     this.firstName = firstN
2e0d5f8b  8)   }
391f64ef  9) 
1d093875 10)   private String lastName;
1d093875 11)   public String getLastName
1d093875 12)     return lastName;
1d093875 13)   }
1d093875 14)   public void setLastName(S
1d093875 15)     this.lastName = lastNam
1d093875 16)   }
391f64ef 17) }

Being able to add changes in parts, rather than in their entirety, is a useful technique when you have subsets of changes that have been made to a file without interening commits.


Come back next week for another instalment in the Git Tip of the Week series.