Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Towards an open-source Pack200 implementation

2006, eclipse, harmony, java, pack200

I've started work on an open-source implementation of the Pack200 specification for the harmony project. I've contributed the first bunch of code into the Harmony project itself.

Pack200 is a fascinating specification. For those that don't know, it's a way of compressing your Jar files down to a reasonable degree; a space saving of 50% or more is not uncommon. The way that it achieves this compression ratio is to (a) reorder the contents of the class files so that similarly typed objects are adjacent, and (b) to use variable-length encodings to compress smaller numbers down into fewer bytes. A particularly interesting case is the use of Strings (which are prevelant in the byte-code classes). In order to compress two strings, a common substring is taken between successive values and only the differences are stored. That means if you have long Strings in the class file (like java/lang/String and java/lang/Object, then the Pack200 archive will have java/lang/String,(10+Object) stored. There's also some nifty delta encodings that can be used to support sequences of numbers as the difference between that and the last one; so in a sorted list of strings, even if the numbers themselves are large, provided that the differences are small you can still save space.

Contrary to popular belief, the Pack200 spec doesn't actually compress Jars; it just mangles the .class files into a more efficient form. If you think of a series of class files being a 2-dimensional structure (each class has a constant string pool, a constant number pool, some methods) then a Pack200 is essentially a transposition of that class/entry matrix. It gets its space savings by performing sorting and delta compression (as well as variable length encoding for numbers). However, this in turn makes it much more amenable to compression by other techniques, and indeed the pack200 command line program automatically performs GZip compression on the resultant pack200 file (which is why the extension is normally pack200.gz).

One really neat feature of a Pack200 archive is that (uncompressed) multiple archives can be concatenated into a single file, and then decomposed into their original Jars. That makes it really handy if you're distributing a large system; simply pack each Jar individually, then concatenate them all together and GZip the lot.

Although Pack200 can handle files in Jars, there's really not much extra benefit that Pack200 will do. It's really about reorganising a set of classes for optimal transmission; so there's no benefit/point in compressing mainly resource Jars. Also, it's worth noting that the compression (and decompression) is pretty memory intensive, so it's not something that you'd want to do with Jars in your application when it's installed. It's really just a mechanism for smaller (and therefore faster) downloads.

Eclipse 3.2 has support for Pack200 plugins installed via the update manager; so if you're using the Callisto installer to obtain your files (with Java 5), you might be in for a pleasant surprise. You might even find it's faster than downloading the entire distribution, simply because the latter doesn't (at the moment) have Pack200 compressed Jars.