Alex headshot

AlBlue’s Blog

Macs, Modularity and More

More details on 5966702 and pack200

2010, java, pack200

Charles Nutter posted a detailed set of thoughts on the Google vs Oracle patents recently, and postulated that 5966702 was related to the Pack200 specification. He said:

This is basically the patent governing the "Pack200" compression format provided as part of the JDK and used to better-compress class file archives.

In fact, it's nothing nearly as complex as the Pack200 specification, though it might touch upon what Pack200 does.

As a way of introduction, I began writing the initial implementation of the pack200 uncompressor in May 2006, based on the final JSR 200 spec. At no time did I see or use either the RI or TCK for JSR 200; the specification is complex enough on its own. (Here's a list of my pack200 related posts.) The implementation for unpacking was completed by others after me; it's currently classlib/modules/pack200 if you want to look into it.

Having read through 5966702, it describes a way of uniquifying the constants across constant pools only, and combining them in a “mclass” archive. Roughly speaking, each .class file contains a number of Strings, which are used to represent method signatures, referenced classes, and so on. A quick Hello World class includes:

  • java/lang/Object
  • java/lang/System
  • Ljava/io/PrintStream;
  • java/io/PrintStream
  • println
  • (Ljava/lang/String;)V

If you download a lot of classes (whether individually, or as part of a JAR file), then this repetition can build up. The goal of 5966702 (which was filed in 1999, several years before the Pack200 spec was started) is purely to optimise the layout of these constants across multiple files, both to reduce the amount of downloaded data, but also to ensure that the constants are uniquified at runtime as well. (In essence, it's trying to build up a compile-time string interned collection of strings, and then use those globally across classes.)

I can't comment on whether the Dex implementation infringes on this patent or not; IANAL and I don't know the Dex implementation in any case. But this is just a simple approach to uniquifying strings that never made it directly into a JVM; the Pack200 specification, whilst including a uniquification of the strings also employs plenty of other optimisation methods and (for example) doesn't treat the Strings as the same in any case. Whereas 5966702 is literally explaining a uniq type operation on the set of strings, Pack200 represents a much more complex mechanism, whereby the Strings are also sorted, then represented as delta differences from the previous item, as I described in more detail previously.

Finally, it's highly unlikely that Google would be using Pack200 for their runtime requirements. Even the spec notes that there are other compression mechanisms which are better suited for loading dynamically compressed classes. From the technotes on pack200 in Java 6:

However, Pack200 archives are not designed for direct execution by any virtual machine. The greater compression level of Pack200 is won at a cost of complexity in the unpacker, a complexity incompatible with any requirement of direct loading or execution.

Any memory size calculations or layout on disk is therefore nullified by the pack200 spec itself. A packed JAR file must be reconstituted in its entirety, often with creating the JAR in memory first before being able to write out any of it.

Lastly, it's worth noting that 5966702 refers specificially to classes and in the class file format. (It even goes to lengths to include the Java Class File specification in the patent application.)

Embodiments of the invention can be better understood with reference to aspects of the class file format. Description is provided below of the Java class file format. Also, enclosed as Section A of this specification are Chapter 4, "The class File Format," and Chapter 5, "Constant Pool Resolution," of The Java Virtual Machine Specification, by Tim Lindholm and Frank Yellin, published by Addison-Wesley in September 1996, .COPYRGT.Sun Microsystems, Inc.


The Class File Format

This chapter describes the Java Virtual Machine class file format. ...

So, in conclusion; this patent isn't really to do with Pack200 in any significant way; and it's entirely to do with how Java classes can be combined to have a pre-uniqified String constant pool, such that the constant pool is capable of being loaded straight into memory in one gulp. How this might affect Dex classes is unknown, especially since these aren't Java classes, don't use the Java class file format, and aren't combined into a single 'mclass' entry.

The Dex format does merge multiple Dex classes together and have a shared constant pool across all instances. In that regard, it's more advanced than 5966702 is, because it is also doing the referencing for variables, methods and others. (Some information about the Dex format is available at retrodev.) In some regard, it is processing Java class files as well; however, the output is not a set of Java classes – not only is the constant pool an aspect, but the instruction set is changed and translated into something else as well. In a nutshell, it's not so much a processor of Java classes as a translator into a different format. If the 5966702 patent is applicable to this as a tool for pre-processing a set of Java classes, then so too would any tool like a static checker or annotation processor which reads in sets of Java classes and generates an output.

It will be interesting to see what comes of this tussle.