Thursday, April 28, 2011

Adventures in Multi-Architecture Eclipse

References

I've been working on a multi-architecture single build for Eclipse, as described previously, and have now got it to a state where I can generate a build for multiple architectures into a single install. Unfortunately, it's not easy to script with Tycho (yet) but it can be done.

The problem is largely with P2. Although the plugin soup can handle having plugins installed for both its platform and others simultaneously, P2 needs to be told (through the use of a profile) which subset it is to look at. Instead of doing something sensible (say, assuming that the current set of installed plugins represents the 'starting point' and going from there), P2 insists that the profile be built up by a series of installation requests.

The problem is this profile contains architecture-specific installation. Even if you install the same RCP base product, the profiles are different as they reference different fragments (so, on an x86 build you have SWT-x86, and on an x86_64 build you have SWT-x86-64). Due to this three character difference, you have to build the product twice.

The first approach – use a shared bundle pool – doesn't work. That's because P2 checks to see if a bundle has already been installed, and if so, skips it. This works fine for bundles without architecture-specific fragments, but fails for those with. Run P2 on a shared install for x86 and x86-64, and when the second install happens, it says “Hey! SWT is already present; nothing to do”. Of course, this misses the fact that SWT needs the SWT-x86-64 fragment in order to work and so you end up with a failed product.

The second problem is the configuration folder. Although this is 99% platform agnostic, the simpleconfigurator has a list-of-bundles to install upon startup. This too contains the architecture-specific fragments, so although it is almost identical, it differs in a few key places.

Fortunately, all is not lost. We can fix this by doing three things:

  1. Have each architecture have its own profile name
  2. Have each architecture have its own configuration directory
  3. Rename the executables per platform (and corresponding ini files)

The first is relatively easy; in the P2 director, pass a different profile name. Unfortunately, Tycho doesn't currently let you have a different profile name per architecture so you either have to use two projects or a hacked version in order to achieve this goal.

The second is more challenging. There's no way of telling the director what the configuration directory should be at build time, although you can specify this at run time. The trick is to rename the configuration directory to something architecture-specific (say, configuration.x86-64) and then write a pointer to that from the ini file for the launcher; e.g. -config configuration.x86-64. That way, running Eclipse.x86-64.exe will load Eclipse.x86-64.ini, which points to configuration.x86-64.

This does mean that an installation of a plugin for one architecture won't make it visible in the other architecture (they share different configuration and profile directories) but at least both can be launched from a single image. A post-process cleanup script can be run on a per-installation basis to remove unnecessary products if desired.

Being able to serve a single platform from a single build is an important one in minimising download and maintenance costs across an organisation or disk usage across the internet as a whole. Adding launchers for other architectures doesn't add much to the installer size; though whether it makes sense to combine different platforms (Windows, Mac, Linux) may be up to the use cases and policies that require them. And whilst it's possible to merge the executables for those that support fat binaries, this isn't possible on all platforms.

However, there's really no reason why there isn't a single build for Windows (all platforms), Linux (all platforms) and Mac (all platforms). It shouldn't be necessary to have to download everything again just to get a slightly different fragment, and maybe this approach will be a way of allowing a transition to occur to a more sensible build strategy.

Tuesday, April 26, 2011

Git Tip of the Week: Tags

References

This week's Git Tip of the Week is about working with tags. You can subscribe to the feed if you want to receive new instalments automatically.

Descriptive labels

Git provides a couple of mechanisms for identifying changes by labels instead of by unique hash values.

The first, we've already seen, is branches. When we switch between two branches, we're really using the descriptive label to identify a specific commit to switch to.

The second, which we'll introduce here, is tags. A tag is like a branch, in that it identifies a specific commit with a descriptive label.

Branches versus Tags

What's the difference between tags and branches? The workspace is (almost always) associated with a branch, called master by default. When it is, a commit will automatically update the master reference to point to that new commit; in other words, branches are mutable references.

A tag, on the other hand, is created to point to a specific commit and thereafter does not change, even if the branch moves on. In other words, tags are immutable references.

Annotated Tags

Git has two flavours of tags; annotated and non-annotated. When using them, there is little difference between the two; both will allow you to refer to a specific commit in a repository.

An annotated tag creates an additional tag object in the Git repository, which allows you to store information associated with the tag itself. This may include release notes, the meta-information about the release, and optionally a signature to verify the authenticity of the commit to which it points.

Examples

We can create a simple tag, based on the current repository's version, with:

$ git tag example

This creates a lightweight tag as a reference in .git/refs/tags/example, which points to the current commit. If we want to make it as an annotated tag, we need to supply -a, and a message with -m:

$ git tag -a v1 -m "Version 1 release"

This will create an (unsigned) annotated tag object, containing that message and a pointer to the commit object. Now the reference in .git/refs/tags/v1 will point to the tag object, which then points to the commit.

If we wanted to guarantee the authenticity of the tag, we could use -s on the git tag command. This uses gpg to sign, based on your email address – though you can use -u to specify a different gpg identity instead. You can verify the signature of an existing tag with -v.

To list the local repository's tags, run git tag without any arguments; or, for a pattern, use -l with * as a wildcard:

$ git tag
example
v1
v1s
$ git tag -l *s
v1s

Finally, to get rid of tags, you can delete them with -d:

$ git tag -d v1
$ git tag
example
v1s

Deleting tags are OK if you never made them publicly available, but you really should avoid deleting tags once you've pushed them to a publicly readable location. Similarly, you shouldn't change a tag once it has been released to the wild either.

Contents and Describe

In order to see what the tag contains, you can use git show, as you can with other git objects:

$ git show v1s
tag v1s
Tagger: Al Blue <alblue@example.com>
Date:   Tue Apr 20 09:00:00 2011 +0100

Version 1 signed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)

iF4EABEITke4AkyRUh8ACgkQWwXM3hQMKHZq5QD/esqKyinelXGM1TSzUqEzuBdI
Ah2Cq/5TS3j4kiP4+UUA/2nN2SVoWwYryN9234kgWUvZIrV1P0FGTG+lAEN5avj3
=JICf
-----END PGP SIGNATURE-----

commit 91a2b24....

If the tag is an annotated tag, you'll see the message and the tag object, followed by the commit. If the tag is a lightweight tag, then you'll see only the commit object.

A key difference between annotated and non-annotated tags is in the use of git describe. This gives an identifier of the repository, based off of the nearest annotated tag. If we were to run now, we'd see a reference to the v1s annotated tag:

$ git describe
v1s

If the current commit exactly matches that of a tag, then only the tag name is printed. If there are changes, then git describe will print out the tag name, a hyphen, the number of commits made, a hyphen, the letter 'g' and then the commit identifier. This allows anyone to use that explicit revision to identify the commit, through the hash at the end. As such, it is often useful to include that in file versions as a means of identifying it at a later stage.

# Add a commit
$ touch file
$ git add file
$ git commit -m "Adding file"
$ git describe
v1s-1-g24242c3

The letter g is added to denote a git managed version; so other repositories can use the same format but substitute that letter for a different one.

If no annotated tags are found then it will print fatal: No names found, cannot describe anything. To allow describe to use non-annotated tags, run with git describe --tags. It's also possible to get it to describe against a branch using git describe --all, although this only makes sense if the branch is known remotely.

Pushing and Pulling

Since a tag (either annotated or lightweight) is just a reference on your local repository, it is not sent up by default to the remote repository during pushes. (This is one observable difference between Git and Hg.) Instead, you can git push the tag individually, or you can run git push --tags which will push all tags. For “release” tags (e.g. V1.0.0) it is conventional for these to be annotated tags; it is relatively rare that you will push a lightweight tag to a central repository.

For pulling, any tags associated with your current branch will be fetched when you check it out. This may result in not having all the tags in your local repository that the remote repository has. If you'd like to fetch them all, you can do git fetch --tags to pull them all in, or git fetch tag to pull a single one.

Summary

Tags in git are lightweight references that point to an SHA hash of a commit. Unlike branches, they are not mutable and once created should not be deleted. Tags may be lightweight (in which case they refer to the commit directly) or annotated (in which case they point to a tag object which points to the commit). Tags used to denote versioned releases typically use annotated tags, and for many open source projects, the tags will also be signed.


Come back next week for another instalment in the Git Tip of the Week series.

Tuesday, April 19, 2011

Cleaning up Google results

References

If you search for information with Google, you might occasionally find that your results are infiltrated with useless copy-and-paste sites. Fortunately, you can block these from your results if they turn up, provided that you're signed in to a Google account when you're searching.

It's also possible to define a list of known bad sites even before they appear in a search, by going to google.com/reviews/t. Anything you search for will automatically exclude results from those sites mentioned. You also use the same page to unblock sites that you have accidentally blocked (or that you want to include back in, if it cleans up its act).

There's more information in the help page and the blog, which announced it last month. The manual site list is a useful way of populating against known bad sites; note that the block option only gets shown if you first navigate to a site, then return back again; since Google's algorithms expect you to have gone to the site to discover its uselessness.

Broadband speed

References

Tomorrow, I get fibre broadband installed. Well, fibre to the cabinet (rather than to the home). Got to be an improvement over where I am today:

SpeedTest result prior to broadband upgrade (1.21Mb/s down, 0.4Mb/s up)

Here's what it looks like after the install:

SpeedTest result after to broadband upgrade (37Mb/s down, 8Mb/s up)

Consider me a happy easter bunny :)

Git Tip of the Week: Merging

References

This week's Git Tip of the Week is about working with branches. You can subscribe to the feed if you want to receive new instalments automatically.

Merging

So far, we've just looked at committing content to the git repository. In other words, we've been building up linear histories, one commit at a time. Although there's nothing wrong with this – after all, most other version control systems you've been used to may work in exactly that way – there's a lot of power that Git bestows upon you with branches.

A lot of people avoid branching, if they've only been used to lesser version control systems like CVS or SVN. That's not because branching is especially different (although still an order of magnitude slower than Git) – rather, it's the lack of good merging that kills branching. As a result, branches are reserved for product releases, and then, begrudgingly.

Git, on the other hand, makes branching and merging free – so much that branching becomes a way of life in a Git workflow.

A merge brings together one more commits onto the current branch. Typically the commits are branch names, but they can be any hash in the repository.

A common workflow is to branch off an item of development to implement a new feature. In non-distributed systems, this is often represented as the “check it out but don't commit it” phase; but in Git, creating a branch is free and committing as you go is encouraged – a bit like autosaving your source files. When you're ready, you can merge it back onto the main development branch:

$ git checkout -b feature
# implement feature
$ git commit -m "Feature part 1 implemented"
# implement feature
$ git commit -m "Feature part 2 implemented"
# test, check everything is OK
# Switch back to master:
$ git checkout master
$ get merge feature
# Optionally, delete the feature branch
$ git branch -d feature

What we've done here is create a new feature branch feature, worked on it for a while, then switched back to master and merged the contents in. It's fairly common for this pattern to occur frequently; sometimes, with features overlapping each other. Don't worry about it; Git always keeps track of your changes.

If there's no problems, then git merge will automatically create a commit for you. If not…

Conflicts

Nothing in life is free, and merging is no exception. If you've changed the same file in two or more divergent commits, then you might end up with a merge conflict. This is like other version control systems; you get <<<<<<< ======= >>>>>>> markers in the code, like you'd see in other version control systems. However, you see these a lot less frequently in Git, because commits tend to be smaller, and in many cases, Git can replay re-orderings in your code.

To fix the merge conflict, edit the file, removing the merge conflict markers and resolving on a conflict-by-conflict basis, and git add the file. You can use git status to show you the list of files; those with merge conflicts will be shown separately, so you can step through these individually, fixing and git adding them as you go.

Once you're done, running git commit will commit all your merge changes. Unlike previous cases, where you have a single parent mentioned in the commit metadata at the top, this time you'll see two parents. These represent the previous version of this tree and the merged tree that you're bringing in.

There's more interesting things we can do with merged trees and merges, but we'll cover those in more detail another time.


Come back next week for another instalment in the Git Tip of the Week series.

Monday, April 18, 2011

Associated Business Solutions, Inc

References

To anyone thinking of hiring the services of Associated Business Solutions, Inc:

Don't.

These guys have been spamming the MacFUSE forum repeatedly over the last few weeks advertising for positions that are utterly unrelated to OSX development or filesystems, and don't seem to take a hint in responses that their services are not required in this forum.

Anyone you hire from them is likely to have been picked up by scamming through forums and the quality of such candidates are likely to be unrelated to the position that they have been offering.

Specifically, they have been posting as “Faheem Ghouri” and “Abdul Aziz” from the domain name “absolutions.us” which, for obvious reasons, I'm not linking to directly here.

Please feel free to trawl the MacFuse mailing list to verify.

Tuesday, April 12, 2011

Git Tip of the Week: Branches

References

This week's Git Tip of the Week is about working with branches. You can subscribe to the feed if you want to receive new instalments automatically.

Branches

So far I've only talked about code which is on a single branch, or in Git terms, the master branch. That's fine for getting started, but you will soon find that you want to create branches for parallel development.

First, it's worth briefly mentioning how Git handles branches, because it's a fairly fundamental part of how Git works. Every Git repository is a tree of commits, with the initial commit being the root of the tree. Every subsequent commit has one or more parents; so from any given commit, you can walk up its parent tree(s) to find the initial commit of a repository.

Since each of these commits is represented as a hash, like 9ab532… then any point on the Git repository's tree of commits can be uniquely identified by this number alone.

As a result, every reference mechanism in Git basically boils down to storing this hash value somewhere. For (local) branches, these are stored in the files refs/heads/*; you'll find a master file in there with a 40-character hash as its contents.

Working with branches

To display a list of branches in Git, run git branch. This shows the list of branches that are currently in your repository. The asterisk shows you which branch you're on at any time.

To create a new branch, based off the current branch, you use git branch name. If you want to base it off a different commit, you can use git branch hash name. To switch to a different branch, you use git checkout name. Here's what it looks like when put together:

$ git branch
* master
$ git branch new
$ git branch
* master
  new
$ git checkout new
$ git branch
  master
* new

There's a shorthand for creating a branch and checking it out, which is git checkout -b name [hash]; so in the above case, we could have done git checkout -b new:

$ git branch
* master
# Shorthand for: git checkout -b new master
$ git checkout -b new
$ git branch
  master
* new

You'll find that what's happened is there has been a file created, refs/heads/new, which contains the new pointer to the tree. Since we created one from the other, they've both got the same contents. However, if we were to commit a change to the new branch, then they would be different. The master branch would stay where it was, whilst the new branch would be one commit ahead.

If you were to checkout the master branch instead, and create a commit, then the branches would start to diverge. Alternatively, we might want to bring in that change from new to master, which we'll do with a merge next time.


Come back next week for another instalment in the Git Tip of the Week series.

Wednesday, April 06, 2011

Multiple Architecture, Single Build

References

One of the things you take for granted on a Mac is that applications will Just Work, regardless of what the hardware you're running on is based on. OSX has always been able to create fat binaries – that is, binaries which have more than one processor variant type – and it selects the one appropriate for the system you're running on. In the days during the PPC to Intel switchover, this gave a seamless transition – but even now, we still use fat binaries which contain both 32-bit and 64-bit executables.

OSGi has the ability to be packaged with more than one processor type, and either using OS-specific fragments or headers in the Bundle-NativeCode will select the appropriate version for a given native library. Some libraries (like SWT) are highly coupled to their native library implementation, whilst others (filesystem resources) provide additional implementations but may fallback gracefully if not present.

Per-platform builds

Why, then, do we waste terabytes of the Internet's bandwidth by packaging up multiple types of Eclipse applications, each customised for precisely one combination of operating system and yet every one of them containing the same 98% of pure Java code? Even multiple windowing systems can co-exist peacefully in an Eclipse installation; if the bundle for the required software is not applicable to that platform, it doesn't get installed.

All of this is bundled into a minor “native launcher” that's dependent on the OS, the size, and the windowing system (and the latter only because it has to ask for a default workspace and show a pretty splash screen). And the mechanisms which churn out these products (which largely have remained the same since before Eclipse was OSGi based) still spew out that same 98% of Java code for each build before writing out a platform-specific eclipse.exe. Oops, downloaded the 64 bit Eclipse instead? Back you go, wasting everyone's bandwidth with a 300M+ download just to get a new 70k eclipse.exe.

Fortunately, sanity can be restored. If you are in the position to create an Eclipse-based product, then do yourself a favour – when it has finished building its variants for all of the different systems, just merge the folder contents together. There's no reason why win32.win32.x86 and win32.win32.x86_64 can't sit together in the plugins directory; after all, only one of them is valid at any one time.

Bundle configurator

The only thing you need to be mindful of is the bill-of-materials launcher, stored in configuration/org.eclipse.equinox.simpleconfigurator/bundles.info. This is a text file, containing all the bundles that will be installed into the runtime. It doesn't matter if the plugins directory contains things; unless they're in bundles.info then Equinox doesn't load them. Think of it as the simplicity of Apache FileInstall coupled with the ls-lr of yesteryears' FTP sites.

The only reason I mention it is that when you run the build-everything-a-few-times step to get your almost-identical-build-folders, there are some subtle differences. You'll notice that the SWT for 64-bit Windows is different from the SWT for 32-bit Windows, for instance.

If you just do a blind copy over of everything, then what will happen is you'll end up with a list of plugins that contains the right contents, but the bundles.info from only one of them. When you try and launch on the other platform, P2 will helpfully tell you that SWT doesn't exist, despite the fact that you can plainly see it in the folder. P2 only believes what you tell it to, not evidence of existence, so you have to tell it to believe that there's an item in the plugins directory for it to work.

Fortunately, the bundles.info is a plain text file with key=value pairs; so you can in fact just concatenate the two files together. This will leave duplicates; but in fact, it's a line-oriented text file so you can simply do a sort and uniqueify operation to ensure that every bundle is mentioned once. (You need to be careful in general about different versions; that optimisation is left as an exercise for the reader.)

The other thing you have to take into account is the native launcher itself. Whilst each native launcher has a corresponding library (which can co-exist), there can be only one eclipse.exe at the top level. The simplest way around this is to rename it Eclipse32.exe for the 32-bit version and Eclipse64.exe for the 64-bit version (and correspondingly, other versions such as the wpf windowing system).

Cut down launcher

With this, we have the ability to make a cut-down launcher, which can run on multiple systems, in a single download. Yes, we would add slightly to the total installed download size versus a single OS but the overall savings would be immense. All it really needs is to be based on org.eclipse.platform, along with org.eclipse.equinox.p2.ui and possibly the marketplace client, and you have an all-in-one download which will be identical for every Eclipse IDE based user.

You can then switch over to the P2 based installer for getting the latest and greatest; and when someone needs to upgrade to the latest version, even if they've got the multi-OS older version, they can still use that to bootstrap a P2 installer client with the right data in place. For example, the RCP binary is only 56MB.

Furthermore, we can lose the source bundles for this all-in-one download, whose pack'ing is utterly pointless (non-Java JARs don't benefit from packing versus just GZipping in essence) and just download executable code. For those that need it, source on demand is the way forward.

The same argument extends to other operating systems. Whilst a Windows and Linux build could cohabit (their launchers are eclipse.exe and eclipse), the Unix variants might need to have their own locations. Mac users would be fine; we could just have the Eclipse.app and ignore the *.exe cruft in the top level.

Bloody Large Zips

So, who's with me in trying to stop the proliferation of Bloody Large Zips on the Eclipse mirrors containing the same content repeated time after time after time? It's really no wonder that Eclipse mirrors are pulling out due to disk space requirements (or why Denis is campaigning for more disk storage) – the problem is the culture of Eclipse builds producing the same stuff over and over again.

All we really need is a cross-platform, all-in-one IDE solution with P2/Marketplace client, and a P2 repository to point it to. EPP would then become a feature selection choice on the Welcome page for you to install your own content. Look at the list of downloads for the Eclipse 3.6.2 platform. I count around 50 installs of the base platform; so that's 50 copies of org.eclipse.osgi all the way up to org.eclipse.ide at a minimum. And that's without the EPP builds doing exactly the same thing. Just look at the SDK:

InstallSize
eclipse-SDK-3.6.2-aix-gtk-ppc64.zip 170.9M
eclipse-SDK-3.6.2-aix-motif.zip 169.9M
eclipse-SDK-3.6.2-hpux-motif-ia64_32.zip 169.8M
eclipse-SDK-3.6.2-linux-gtk-ppc.tar.gz 170.5M
eclipse-SDK-3.6.2-linux-gtk-ppc64.tar.gz 170.2M
eclipse-SDK-3.6.2-linux-gtk-s390.tar.gz 170.1M
eclipse-SDK-3.6.2-linux-gtk-s390x.tar.gz 170.2M
eclipse-SDK-3.6.2-linux-gtk-x86_64.tar.gz 170.6M
eclipse-SDK-3.6.2-linux-gtk.tar.gz 170.5M
eclipse-SDK-3.6.2-linux-motif.tar.gz 172.8M
eclipse-SDK-3.6.2-macosx-carbon.tar.gz 169.9M
eclipse-SDK-3.6.2-macosx-cocoa-x86_64.tar.gz 170.0M
eclipse-SDK-3.6.2-macosx-cocoa.tar.gz 170.1M
eclipse-SDK-3.6.2-solaris-gtk-x86.zip 170.2M
eclipse-SDK-3.6.2-solaris-gtk.zip 170.3M
eclipse-SDK-3.6.2-win32-x86_64.zip 171.0M
eclipse-SDK-3.6.2-win32.zip 171.0M

This is almost 3G of data; when even assuming a 2M platform specific size would indicate it could probably all be compressed into a single 200M download. (In comparison, the size of the Helios P2 mirror is around 5G; and that's including the 3.6.0, 3.6.1 and 3.6.2 releases.) Not only that, the downloads are likely to be faster; instead of having to download from a single server, downloads from P2 repositories have the option to create multiple connections to multiple mirrors to resolve the data.

Conclusion

We need to stop the packaging proliferation that is damaging Eclipse's mirrors, and make it easier to have a bootstrapped environment install on demand the content that's needed. A suitable welcome/intro screen will probably be the fastest way to market; and though packaging has been a success in terms of advertising to different types of users, we should move this advertising page into the Eclipse welcome screen once the bootstrap process has run, and not as a set of pre-prepared zips on the server.

Tuesday, April 05, 2011

Git Tip of the Week: Aliases

References

This week's Git Tip of the Week is about creating shorthands for commonly used commands. You can subscribe to the feed if you want to receive new instalments automatically.

Aliases

There are frequently times when you want to be able to perform the same git command repeatedly, often with the same arguments. Whilst it's possible to write shell scripts or use shell aliases in order to be able to remember these, these may sometimes not be sufficient.

The git command line can be configured to use its own aliases for commonly used tasks. Aliases are stored in git config files, which include ~/.gitconfig and path/to/project/.git/config. As a result, it's possible to store aliases in a per-project as well as a global state.

To set up an alias, you can either edit the git config file directly, or run git config --global or git config for the global or per-project settings respectively. An alias is configured as follows:

git config --global alias.ci commit
git conifg --global alias.cia commit -a
git config alias.hub push github

In the example above, running git ci has the same effect as git commit, and git cia has the same effect as git commit -a. Since these are --global properties, these commands will work on any repositories on the same machine.

The git hub command then becomes a push to github, but because of the lack of the --global flag, it only affects the repository that you're working in.

Alias expansion

Sometimes it's desirable to execute multiple git commands, which the short alias form doesn't allow. Or you might want to do something with the result, like pipe it through an alternative utility. That's where the shell aliases can help.

If a git alias is prefixed with an exclamation point, then the alias is expanded as if the entire command had been given on the command line. This allows you to do pipe or sequence operations which can be quite handy:

# Define 'git last' to show last commit message
git config alias.last show -s HEAD^{commit}

# Now lastwc is the word count of that message
git config alias.lastwc '!git last | wc'

# We can also prune off headers and pipe to 'pbcopy'
# which puts it in the clipboard on OSX
git config alias.lastcopy '!git last | tail -n+5 | pbcopy'

These expansions also work Windows, with the MinGWSys version of Git, and the Cygwin version as well, where standard shell aliases wouldn't work.

You can find out what you do most frequently in Git by setting up this alias:

git config alias.freq 'history | cut -c 8- | grep git | sort | uniq -c  | sort -n -r | head -n 5'

Running git freq will show you your 5 most frequent git commands, which you might like to consider for creating aliases in the future.


Come back next week for another instalment in the Git Tip of the Week series.