Tuesday, August 30, 2011

Git Tip of the Week: Trees

References

This week's Git Tip of the Week is about git tree storage. You can subscribe to the feed if you want to receive new instalments automatically.


Last week, we looked at how Git stores objects in the local repository. This week, we're going to look in to how they correspond to directories, or trees.

Git uses a uniform storage model for all of its objects. Each object is identified with its hash, but the type of the object is stored in metadata along with the object. Thus, it's possible to find out from an ID what its type is, as well as its content:


(master) $ # Note: objects from previous
(master) $ git cat-file -t e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
blob
(master) $ git cat-file -p e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
(master) $ git cat-file -t 8ab686eafeb1f44702738c8b0f24f2567c36da6d
blob
(master) $ git cat-file -p 8ab686eafeb1f44702738c8b0f24f2567c36da6d
Hello, World!

How do these objects get packaged up, so that you can get them in your working directory? Well, blobs are arranged in trees, which corresponds to directories in a directory structure. If we have a directory with a file called empty, we can print out its contents:


(master) $ git ls-tree master .
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	empty

Note that this isn't listing the contents on disk; rather, it's showing you Git's view of the folder. This allows us to list items on different branches (or tags) without needing to check them out first, and in fact, is how hosting sites like GitHub and tools like GitWeb work. The master is simply asking to show us the branch with the same name.

What happens if we add another file, with the same contents?


(master) $ cp empty anotherEmpty
(master) $ git add anotherEmpty
(master) $ git commit -a
[master ca5fc4f] Another empty
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 anotherEmpty
(master) $ git ls-tree master .
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	anotherEmpty
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	empty

We have a new entry in the tree, but the blob pointer is to exactly the same object, like a hard link on a UNIX filesystem. As with a hard link, if we change one of the objects, we don't change the contents; rather, we create a new copy (since it has a different hash) and the tree is updated to point to that instead.

How is this tree stored in the repository, though? Well, it turns out that it's another object type, stored in the same mechanism as blobs. You can find out the tree from a commit (or branch) with the ^{tree} suffix:


(master) $ git cat-file -t HEAD^{tree}
tree
(master) $ git cat-file -p HEAD^{tree}
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	anotherEmpty
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	empty
(master) $ git rev-parse HEAD^{tree}
2b61e34a91ca9780ea2f943e72f1a4a022cdd206

The tree represents a directory, containing a mixture of blobs and trees. We can find out what it resolves to using git rev-parse, to determine that this tree is an object which hashes to 2b61e34....

How is this tree created? Well, again, it's a well-formatted object which is hashed through its sha1 value. The object type is tree, and instead of having simple values like the blob, the tree is a set of index values pointing to the objects, along with a mode (typically 100644 for files and 100755 for directories). However, we know the size of the SHA hash, so it doesn't need to be in human-readable numbers; we can serialize it out as bytes. The length works out at 28 bytes per row, plus however many bytes there are in the file name. In our case, we have 28 + "anotherEmpty".length() + 28 + "empty".length(), or 73 bytes in total:


(master) $ echo -en "tree 73\x00→
100644 anotherEmpty→
\x00→
\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b→
\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91→
100644 empty→
\x00→
\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b→
\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91→
" | shasum
2b61e34a91ca9780ea2f943e72f1a4a022cdd206  -

Creating a tree, on the other hand, is a little more tricky. To solve this problem, the git mktree command exists, which can take a git ls-tree formatted stream, and generates a tree object for you. It's a little like the git hash-object from above, but without having to convert the references from the string hash to a sequence of hex characters. In addition, it also ensures that the tree's contents are appropriately sorted, which is a mandatory pre-requisite (in order to support fast retrieval).


(master) $ git ls-tree master .
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	anotherEmpty
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	empty
(master) $ git ls-tree master . | git mktree
2b61e34a91ca9780ea2f943e72f1a4a022cdd206

This allows us to easily create a new tree, with a new file in it:


(master) $ echo -en→ "
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391\tanotherEmpty\n→
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391\tempty\n→
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391\tvoid\n" | git mktree
d2d6bbd1c25c154fcbb045d66e8a6f9b83587a68

We've now got three files in a tree (all the same contents; all empty), but now we can referr to the new tree directly. We can even list it again:


(master) $ git ls-tree d2d6bbd1c25c154fcbb045d66e8a6f9b83587a68
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	anotherEmpty
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	empty
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	void

Although we haven't shown it here, if you wanted to create a tree with other trees (instead of blobs) then they work in exactly the same way; the difference is the word 'blob' is replaced with 'tree', and of course, the object has to point to the right hash.

We've now seen blobs and trees; next week, we'll have a look at how they turn up on branches in the form of commits.


Come back next week for another instalment in the Git Tip of the Week series.

Thursday, August 25, 2011

Steve Jobs retires from Apple

References

It had to happen sooner or later. Steve Jobs, the founder and most successful CEO that Apple ever had, has stepped down from the firm, with a letter that simply suggested executing the succession plan, and putting Tim Cook into the driving seat.

It's worth noting that he would like to remain as Chairman of the Board and continuing as an Apple employee, a request that was immediately granted. This quote from the board summarises Steve's importance to Apple over the years:

“Steve’s extraordinary vision and leadership saved Apple and guided it to its position as the world’s most innovative and valuable technology company,” said Art Levinson, Chairman of Genentech, on behalf of Apple's Board. “Steve has made countless contributions to Apple’s success, and he has attracted and inspired Apple’s immensely creative employees and world class executive team. In his new role as Chairman of the Board, Steve will continue to serve Apple with his unique insights, creativity and inspiration.”

Not bad from someone who grouped together a few hackers in a garage, and then took it to become the most valuable company in the world. Let's also not forget his role in Pixar and Next on the way, too.

It's easy to be accused of being an Apple Fanboi, but having started my programming career on a NeXT pizza box and learning Objective-C on Nextstep, it has been a career (and a set of products) that I have followed for decades. When OSX first came out, my company bought its first PowerBook G4, and from then, I've had a succession of Macs and MacBooks, from the Cube through to an original iMac in Bondi blue, and the original G5 cheese grater, all of which still work today. Meanwhile, in my garage, I have the remnants of various beige boxes in various states of disconnection whose sole purpose is to offer spares for my printer's memory and if I need to access a floppy for some reason.

I think this is the start of a gradual fade away from Steve; I suspect he might appear "One more time" at the next iPhone event, but thereafter he will be involved less and less. Whether that matters remains unclear; the design genius behind most of the recent Apple products has been Jonny Ive, and many of the hardware advances have been made by dedicated teams across Apple, which I'm sure will continue. But as iOS and OSX draw ever nearer, whether Steve's autocratic design filter will be replaced or whether it will even be needed remains to be seen.

Here's to the crazy ones. Here's to Steve Jobs.

Tuesday, August 23, 2011

Git Tip of the Week: Objects

References

This week's Git Tip of the Week is about git object storage. You can subscribe to the feed if you want to receive new instalments automatically.


This week we'll be taking a bit of a deeper dive into the way that Git stores its objects. We'll look at how they're identified, how they're related, and see why Git handles moves better than other version control systems.

By now, you're familiar with the concept of a commit hash (or just commit) – a 40-character hexadecimal sequence, which can uniquely identify a change log, such as d16085b3b913e5bc5e351c0a7461051e9973629a. But where does this come from?

A git repository is actually just a collection of objects, each identified with their own hash. Whenever you add a file, you get a hash generated on its contents, and this hash is used to uniquely point to that version of a file. For example, if you create an empty file, it will have the hash e69de29bb2d1d6434b8b29ae775ad8c2e48c5391. You can confirm this by adding an empty file to a repository and using git ls-tree to see the contents:


(master) $ touch empty
(master) $ git add empty
(master) $ git commit -a -m "Empty"
[master (root-commit) 4145429] empty
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 empty
(master) $ git ls-tree master .
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	empty

What git ls-tree is saying is that the master branch contains a file called empty whose permissions are 100644 (owner read/write, group+other read), and whose hash is e69de29bb2d1d6434b8b29ae775ad8c2e48c5391.

Similarly, if you look in the repository's object store, you'll find that a file .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 has been created. The directory is split such that there are 256 different top-level names (00-ff) and the name of the hash is concatenated with the parent's directory.

So, how does Git compute this value? Well, it uses SHA1 hash; but the SHA1 of an empty input isn't this value. In fact, Git prefixes the object with "blob ", followed by the length (as a human-readable integer), followed by a NUL character, followed by the contents. So for our case, we have:


$ echo -en "blob 0\0" | shasum
$ echo -en "blob 0\0" | openssl dgst -sha1
$ printf "blob 0\0" | shasum
$ printf "blob 0\0" | openssl dgst -sha1

All of these print out the same value, e69de29bb2d1d6434b8b29ae775ad8c2e48c5391. Note that the \0 is the escape code for the NUL character; the -e to echo stipulates it should obey the escape. If you get fef5d… then it is interpreting the \0 as two characters, the \ and the 0. (And if you get be21… or b825… then it's adding newline at the end.)

Instead of calculating this format ourselves, we can use git hash-object to calculate a hash – or, with -w, insert an object in our local repository:


(master) $ echo 'Hello, World!' | git hash-object -w --stdin
8ab686eafeb1f44702738c8b0f24f2567c36da6d
(master) $ ls .git/objects/8a
b686eafeb1f44702738c8b0f24f2567c36da6d
(master) $ echo -e 'blob 14\0Hello, World!' | shasum
8ab686eafeb1f44702738c8b0f24f2567c36da6d

This has created a hash of our object (blob 14\0Hello World!\n) and written it into the objects directory under the same name. The contents are compressed with the DEFLATE algorithm; but at the moment, it's not used or referred to anywhere in our tree. Although we don't see it in the working directory, we can see it in the repository itself:


(master) $ git show 8ab686eafeb1f44702738c8b0f24f2567c36da6d
Hello, World!

Next time, we'll look at how Git organises objects into directories, and ultimately, commits.


Come back next week for another instalment in the Git Tip of the Week series.

Tuesday, August 16, 2011

Git Tip of the Week: Detached Heads

References

This week's Git Tip of the Week is about detached heads. You can subscribe to the feed if you want to receive new instalments automatically.


Today's topic is the subject of detached heads. They're not as bad as they sound, and they don't involve a guillotine in any way.

Heads

Since a git repository is a tree-of-commits, with each commit pointing to its ancestor(s), it is possible to directly address a single commit by means of its commit hash. Not only that, but it's possible to record this hash in a variety of different systems; Twitter, E-mail, Bugzilla etc.

Both Git branches and Git tags are merely links to an item, by commit hash, in the repository. Creating a hundred tags (or branches) is tantamount to creating a hundred pointers, and is one of the reasons why Git is so blazingly fast.

Whilst tags are (generally) immutable, branches are not. Each time a commit is made on a branch, the pointer (reference) is updated to point to the newest commit. Thus, three commits on branch involves three modifications to the branch pointer (as well as the corresponding entries being added into the repository for the content).

These pointers are stored in the .git/refs subdirectories. Tags are stored in .git/refs/tags and branches are stored in .git/refs/heads. If you look at any of the files, you'll find each tag corresponds to a single file, with a 40-character commit hash.

The point for this post is that branches are also known as heads. When you have a master branch, there's a file refs/heads/master which is a pointer to where the current branch is at that point.

Detached Heads

So, if a head is synonymous with a branch, what does that make a detached head? Well, it's simply a commit hash which isn't pointed to by a tag or a branch. So, whenever you have checked out a non-referenced head, you end up with a detached head. Perhaps an example is called for:


$ git init example
Initialized empty Git repository in example/.git
(master) $ touch file
(master) $ git add file
(master) $ git commit -m "Initial"
[master (root-commit) 123be6a] Initial
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 file
(master) $ touch other
(master) $ git add other
(master) $ git commit -m "Second"
[master 5a11d1c] Second
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 other
(master) $ git log --oneline
5a11d1c Second
123be6a Initial
(master) $ git checkout HEAD^
Note: checking out 'HEAD^'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 123be6a... Initial
((123be6a...)) $ git log --oneline
123be6a Initial

What we've done here is to create a repository with two commits, then back out one version. (The HEAD^ checks out HEAD's parent.) The master branch is still pointing to the second commit (5a11d1c) but we're looking at effectively an unnamed commit.

The Git checkout contains a .git/HEAD, which normally points (indirectly) to a ref. In the case of 'detached head' mode, the .git/HEAD file contains the commit hash itself:


((123be6a...)) $ cat .git/HEAD
123be6a76168aca712aea16076e971c23835f8ca
((123be6a...)) git checkout master
Previous HEAD position was 123be6a... Initial
Switched to branch 'master'
(master) $ cat .git/HEAD
ref: refs/heads/master

Working with (or on) a detached head isn't a problem. This occurs when you are dealing with bisects, or if you want to simply check out a specific version of a previous commit. There's also nothing to stop you working on this unnamed branch, either; you can keep going and committing as long as you want.

Bear in mind, however, that finding a commit is dependent on what you do with that hash. Typically, you will create a new branch (say, it's a bug fix for a previously released bit of code), or you end up tagging it with a hot fix identifier. There's always the reflag which you can use to get back to your commit if you end up changing out of a detached head. Just remember that the periodic git gc will run, and will clean up commits that aren't referenced (directly or indirectly) by a tag or a branch.


Come back next week for another instalment in the Git Tip of the Week series.

Friday, August 12, 2011

IntelliJ Plugin Development

References

My attention recently turned towards IntelliJ plugin development, and since information on the subject is incredibly scarce I thought I'd jot down a few notes here. I'll also try and relate it to plugin development with more familiar environments (Eclipse) for those that are interested in making the switch over.

Firstly, it's worth noting that I'm writing this with IDEA 10.5 as a base. Things can change over time, so if you find this on a Google Search ten years for now you can use common sense to see if it still applies.

IntelliJ IDEA has only (relatively) recently enabled plugin development at all, so things have changed over time. One of the most notable issues is that whilst other products (Eclipse, NetBeans) have focussed on a modular architecture from the beginning, IDEA has always been shipped as a Big Bag of Hurt Jars. So both the plugin development – and the available plugins – reflect the youthfulness of the application, often with contradictory results. For example, updating IDEA itself is handled in a completely different way from updating the plugins.

Installing plugins

When IDEA installs, it creates a plugins directory at the root of the application install, which is used to store the plugins themselves. Each plugin gets its own directory, which uses the id of the plugin (and if the id is not available, the name) by default. Underneath that is a mandated lib directory, and inside that are the plugin JARs themselves.


IDEA\
  plugins\
    MyPlugin\
      lib\
        MyPlugin.jar
        MyPluginExtra.jar
    AnotherPlugin\
      lib\
        AnotherPlugin.jar
  

One common way of installing plugins is to just unzip the folder into the plugins directory. The IDEA build process generates a zip file which contains the name of the project as part of the internal folder structure.

Each plugin has a single class loader, so whether it is shipped as a single JAR or as multiple JARs are a matter of convenience rather than anything else. If you consume downstream libraries then you can simply embed a copy of it in the plugin's lib directory.

The existence of lib is somewhat mystifying; it's as if JetBrains had other thought regarding resources which they could then put to use; or perhaps they were just following web app development structure. Either way, any other content appears to be ignored in the plugins directory.

Update sites

You can't have update sites in IntelliJ IDEA. Well, you can't have sites, but you can have site – hardcoded into each IntelliJ install is a constant called DEFAULT_PLUGINS_HOST, which resolves to http://plugins.intellij.net. This is similar to Eclipse's MarketPlace and is usually the de-facto method of installing plugins into IntelliJ.

If you want to have your own update site, you're out of luck. There's no way to host an IntelliJ update site outside of this mechanism.

There is laughably something called “Enterprise Repository support” in which you can have a list of plugins, identified with a file called updatePlugins.xml. This is nothing of the sort.

Yes, you can start IDEA with a -Didea.host.url= http://path/to/updatePlugins.xml property, and yes, it will download that file. However, when you put any plugin in this file, it will show up a dialog asking you to install all of plugins in that file.

It doesn't integrate with the Plugin model (where people will be expecting it to be) and it forces you to install everything. Oh, and it doesn't work either.

Instead, the update site has two URLs which it uses to convey information back to clients who ask. The first is the list of plugins that are available, which it accesses from http://plugins.intellij.net/plugins/list/. This is an XML file (not even compressed!) which contains a list of every plugin known to man (or to JetBrains, at least). The format looks like:

<plugin-repository>
  <ff>"Build"</ff>
  <category name="Build">
    <idea-plugin downloads="5483" size="46833" date="1258402969000" url="http://handyedit.com/antdebugger.html">
      <name>Ant Debugger</name>
      <id>com.handyedit.AntDebugger</id>
      <description><![CDATA[...]]></description>
      <version>1.0</version>
      <vendor email="antdebugger@handyedit.com" url="http://handyedit.com/">Alexei Orischenko</vendor>
      <idea-version min="n/a" max="n/a" until-build="3999"/>
      <change-notes><![CDATA[...]]></change-notes>
    </idea-plugin>
    …
  </category>
  …
</plugin-repository>

Rather than telling you where you could get all of these plugins from as part of a url attribute, JetBrains makes you go back and get them via another URL, which is http://plugins.intellij.net/pluginManager. This seems to act as a redirector for plugins based on their (plugin) id, and it redirects you to a URL where you can download the file from. For example, the first entry in the IDEA repository above is the Ant Debugger, whose id is com.handyedit.AntDebugger. Add this to the URL, and you get http://plugins.intellij.net/pluginManager?action=download&id=com.handyedit.AntDebugger&build=IC-107.322 , which redirects you to the JAR, and which subsequently installs the content. (The name of this file is derived from the plugin's ID and a unique number which corresponds to the download id in the plugins.intellij.net site – it's not related to the repository at all. Note the without the build parameter, the redirection doesn't work – a primitive form of server-side detection of compatible version numbers.

The one ray of sunshine is that there is a way of overriding this default host; if you run IDEA with -Didea.plugins.host=http://somewhere/else then you can use a different update site to the one compiled within IntelliJ. This can do something slightly more intelligent (like; allowing you to use multiple update sites through careful redirections) but it's a significant piece of missing functionality in IntelliJ IDEA that you have to do this.

It's no wonder that IntelliJ users often just extract the contents of a plugin into their local install, or acquire it through the central update site. It's almost impossible to do otherwise.

Plugin Development

In order to do any kind of plugin development in IntelliJ, you have to configure an SDK. This is non-obvious and almost all of the Google searches turn up with nothing of any use whatsoever. Fresh out of the box, IDEA doesn't even have a Java SDK defined.

You need to set up the Java SDK first, followed by the IDEA SDK (which it can use as itself). Go into the Project Settings and find the Platform Settings, which includes SDKs. There's a small [+] icon at the top of the second column; click it and select Java SDK first; it should auto-detect which JDK you've launched it from, but you can select any JDK on your system. Once that's done, click on the [+] icon again and this time select IDEA SDK. It will prompt you for the JDK you've just defined and select itself as the host. Whilst it looks like it's set everything up, unless you hit 'OK' down at the bottom these changes won't be remembered.

Once you have your SDKs defined, you can create a Plugin Module project. This creates a file called META-INF/plugin.xml which defines the name, id, and version of the plugin. There are also several entries for hooking into IntelliJ:

  • Plugin metadata (name, version, id)
  • Application components – global content
  • Project components – things which are specific to a single project
  • Actions – things that show up in menus
  • Extensions – things that plug into extension points

The plugin also has a reference to something which can initialise the component (like a Plugin constructor, or an activator in OSGi/Eclipse). This appears to be called during IDEA startup, but it can block startup until it returns; so if there's long-running operations that you need to do, consider running them in a background thread.

Swing development

My eyes! My eyes! I had really forgotten how bad Swing development can be, until I was forced back into it. Still, it's not as bad as on some platforms but if you're used to IntelliJ you probably see past the UI widget mess in any case. (I'm sure that people will have similar diverging opinions of Eclipse and Xcode; at some level, there's an aspect of familiarity which means you look over warts in your own IDE of choice.)

One advantage of developing IDE plugins for IDEA over that of Eclipse is you get to use Swing. Whilst not a UI win, it is a programming win, since you don't have to worry about dispose() or leaking resources. It's also generally easier to find tutorials on the subject – Eclipse has always been somewhat cryptic with the way that menus and actions are contributed (not helped by the fact that the recommended way has changed every couple of releases).

In any case, it's relatively easy to create tool windows (similar to Eclipse Views, except that they are always present in the window as either minimised entries or showing in the screen somewhere). You get handed a Container in the createToolWindowContent() method (which turns out to be a Swing container) and you can throw what you like in there, wiring it up to the mouse events to trigger actions.

Deploying the plugin is a case of doing “Build → Prepare Plugin for Deployment”. This generates a ZIP in the same folder as your plugin (no option to install it elsewhere, it seems) which contains the above folder structure with your module in. Modularity in Java has actively been harmed by IntelliJ's awful project structure, and is the main cause for pain in leaky implementations for code primarily developed in the IDEA.

If you want to package dependent libraries as well, you can do so by going to the Project Settings and creating a new library, which you then put the JAR into (confusingly, with the 'classes' button). It will then get exported with your plugin when it gets built.

Hello World

With that out of the way, the process involved for a Hello World toolWindow is as follows:


--- 8< --- META-INF/plugin.xml
<extensions defaultExtensionNs="com.intellij">
  <toolWindow id="HelloWorld" icon="/helloworld.png" anchor="right" factoryClass="com.example.HelloWorld"/>
</extensions>
--- 8<  --- src/com/example/HelloWorld.java
public class HelloWorld implements ToolWindowFactory {
  public void createToolWindowContent(Project project, ToolWindow toolWindow) {
    Component component = toolWindow.getComponent();
    component.getParent().add(new JLabel("Hello, World!"));
  }
}

The only thing I found significantly painful was that the runtime on OSX immediately crashed with a lack of PermGenSpace. Fortunately, "Run → Edit Configurations" whereupon you can add additional Java VM arguments (in this case, -XX:MaxPermSize=256m) which solved that problem. One other annoying feature; each time I added a key press in the -VM parameters field, I got a dialog box flash up asking whether I wanted to accept incoming connections or not. I think this is likely to be an issue with the recent Lion build (along with the error message "2011-08-12 08:34:56.286 java[10515:407] -[NSOpenPanel _setIncludeNewFolderButton:] is deprecated. Please stop calling it." when invoking the Open dialog.

Summary

Once you get past the ugliness that is Swing, and resign yourself to manual installation of plugins, developing for IntelliJ isn't that bad. Most of it uses vanilla Swing operations, though it does helpfully suggest some improved IntelliJ Swing classes in place of the standard Swing ones (though in my uses, it actually performed worse than the standard Swing ones did so I ignored that suggestion).

Once you have a displayable component, it's relatively easy to pick up mouse events and respond to actions. As yet, I have not integrated with the ADT or have processed any source code which is a challenge for another day.

Tuesday, August 09, 2011

Git Tip of the Week: Searching for Commits and Changes

References

This week's Git Tip of the Week is about grepping to find logs. You can subscribe to the feed if you want to receive new instalments automatically.


In last week's post, I talked about how to search a git repositories content with git grep. In this second part, we'll look at how you can search the commit logs alone.

Sometimes it's useful to be able to search through the commit log to find out when a change occurred. Some projects like to embed a bug identifier in the commit message; for example, 1ac0a2 in the EGit project contains a reference to Bug: 324736.

To find this quickly (and without having to refer to an external bug tracking system to get the answer), we can use git log --grep to find the commit message that corresponds to that particular bug.


[EGit] (master)$ git log --oneline --grep="Bug: 324736"
1ac0a29 [historyView] Reveal selected commit on filter change
20c9560 [historyView] Preserve commit selection on filter change

This is obviously useful where you have specific commit messages that include the associated bug tracker, but not so useful if you don't record that information with the commit itself.

Another use-case is when you want to find out when a particular change was introduced. For example, there might be a change in a file with the text table.reveal(c);. Normally, you could use git blame to find out the change; but if the change was added and then subsequently removed, it might not be in the current file to change. It might also have been added elsewhere initially, then copied and pasted elsewhere.

git log has another feature which can be used to search for patterns in deltas. These effectively search the patches rather than the content of the files themselves. For example, if we wanted to search EGit for changes that introduced the above code change, then we could run:


[EGit] (master)$ git log --oneline -Gtable.reveal
1ac0a29 [historyView] Reveal selected commit on filter change
8635f79 Show commit corresponding to selection in commit graph table
390b6b1 Branches and Tags links in commit message viewer
dfbdc45 Initial EGit contribution to eclipse.org

Even though table.reveal no longer exists in the current codebase (and as such, the git grep no longer shows this phrase), using git log -G allows us to find when the change was introduced (and when it was removed).

Using git log -G in this way can be used to search history for potentially sensitive information which may have been accidentally committed into the repository. One can even set up period jobs to ensure that the history does not contain information that should not have been committed.


Come back next week for another instalment in the Git Tip of the Week series.

Tuesday, August 02, 2011

Git Tip of the Week: Searching for Patterns

References

This week's Git Tip of the Week is about grepping to find content. You can subscribe to the feed if you want to receive new instalments automatically.


Sometimes, when investigating the contents of a repository, it's not always obvious where to find the definition of a function (or method). Clearly, Unix tools such as grep allow you to find the content easily enough, but if there's a large amount of generated data (such as compiled code) simply looking through all files can be time-consuming.

To find a file with a specific content element, you could use something like find . -exec grep pattern '{}' ';'. This will run the grep command on all files in the working directory. However, there's a faster way of achieving this with git grep instead. Let's say we wanted to find the contents of occurrences in the EGit repository of the variable newPushURI. We could use find to achieve this:


EGit (master)$ find . -exec grep newPushURI '{}' ';'
		URIish newPushURI = uri;
			newPushURI = newPushURI.setPort(GERRIT_DEFAULT_SSH_PORT);
			newPushURI = newPushURI.setScheme(Protocol.SSH.getDefaultScheme());
			newPushURI = newPushURI.setPort(GERRIT_DEFAULT_SSH_PORT);
			newPushURI = prependGerritHttpPathPrefix(newPushURI);
		uriText.setText(newPushURI.toString());
		scheme.select(scheme.indexOf(newPushURI.getScheme()));

OK, we've found some occurrences but it doesn't print the names of the files, which isn't too helpful. We could print the file out afterwards if we wanted to but this wouldn't help for files which don't have the match. Or, you could write some kind of script or alias to handle the scan-and-test. It's also not particularly fast:


EGit (master)$ time find . -exec grep newPushURI '{}' ';' > /dev/null
real	0m1.605s
user	0m0.541s
sys	0m0.849s

An alternative is to use git grep to scan the contents of the current working tree:


EGit (master)$ git grep newPushURI
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                URIish newPushURI = uri;
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = newPushURI.setPort(GERRIT_DEFA
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = newPushURI.setScheme(Protocol.
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = newPushURI.setPort(GERRIT_DEFA
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = prependGerritHttpPathPrefix(ne
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                uriText.setText(newPushURI.toString());
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                scheme.select(scheme.indexOf(newPushURI.getScheme()

Not only does it show which files they are located, it's also an order of magnitude faster:


EGit (master)$ time git grep newPushURI > /dev/null
real	0m0.024s
user	0m0.014s
sys	0m0.033s

The arguments that git grep takes are much the same as grep itself; for example, -l lists files with matches (and -L is files without), -E allows an extended regexp, -i is ignore case and -w is word regexp.

There are also some options which are specific to git. The --no-index example scans the files in the directories, whilst --cached searches blobs in the index.

As well as the current working directory, git grep can also be used to specify a treeish (tag, branch, commit) and subfolders within a repository. If we wanted to look for the regex extension.point in the stable-1.0 branch for matches located in the org.eclipse.egit.core folder, we could do:


EGit] (master)$ git grep extension.point stable-1.0 -- org.eclipse.egit.core
stable-1.0:org.eclipse.egit.core/plugin.xml:   <extension point="org.eclipse.core.runtime.preferences">
stable-1.0:org.eclipse.egit.core/plugin.xml:  <extension point="org.eclipse.team.core.repository">

Finally, it's possible to use -p to print out the name of a function in which a match occurs.


EGit (master)$ git grep  -p newPushURI
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java=        private void setDefaults(RepositorySelection selection) {
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                URIish newPushURI = uri;
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = newPushURI.setPort(GERRIT_DEFA
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = newPushURI.setScheme(Protocol.
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = newPushURI.setPort(GERRIT_DEFA
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = prependGerritHttpPathPrefix(ne
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                uriText.setText(newPushURI.toString());
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                scheme.select(scheme.indexOf(newPushURI.getScheme()

Note that this shows the function annotated by = instead of a : at the end of the name. This can be used to quickly find which functions contain a reference to a given pattern:


EGit] (master)$ git grep  -p newPushURI | grep java=
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java=        private void setDefaults(RepositorySelection selection) {

Come back next week for another instalment in the Git Tip of the Week series.