The Pack200 stuff is coming along; I'm now able to take interfaces that consist of (primitive and String) constants, and abstract/native methods with exceptions and classes that implement interfaces. I've still got to do some major work, so it's no where near usable for real tasks yet, but I'm almost approaching the half-way point. The next step on the line is to decode bytecode -- which isn't an easy task, for several reasons.
Firstly, the constants in the Pack200 code are strongly typed, whereas in Java, some of the bytecode instructions (like
ldc #18) are weakly typed (they can load any kind of constant from the pool). As a result of this, the Pack200 format translates some bytecodes into strongly-typed variants for referencing the correct constant in the global constant pool of a Pack200 segment. It's not difficult, but it's an extra step of work rather than just decoding the numbers from a sequence.
Secondly, there's a lot more to Code attributes than just a simple constant or string value (as is the case with the ones I've dealt with so far). As well as the bytecodes themselves, each code attribute has a set of metadata, which includes the largest possible stack size, local variables, the number of exceptions and so forth, and each of these pieces of information is stored in a separate band in the Pack200 code.
Lastly, there's a bunch of renumbering that goes on with bytecode jumps. Although the jumps (
jsr #168) deal with specific bytecode offset, actually there's only specific places which are valid to jump to in a set of bytecodes. For example, a bytecode with a 2-byte operand (such as
checkcast #192) can only legally jump to the opcode, and not opcode+1 or opcode+2. Thus, if you know the indexes of the valid jump locations in a sequence of bytecodes (e.g. 1,5,7,9,12) then you can renumber these as (0,1,2,3,4). So each of the jumps in bytecode is renumbered (called BCI renumbering in the spec) which adds (marginally) to the complexity.
Of course, there's also huge swathes of missing implementation, which means it's not compliant yet by a long shot. For example, inner classes are handled specifically (and I don't deal with them yet) and Java 1.5 annotations have a life of their own. Unfortunately, the spec is somewhat all-or-nothing; if you don't understand 1 class in 100, then it will break the format of the Pack200 segment, and so none of them will be decodable.
In any case, I've had several successful decompressions of just abstract code (at least, for interfaces without any static initialiser code), and once the next hurdle of the bytecode is over it will start to be useful for some specific restricted input archives.
Oh, and then there's the encoding pack200 archives ... maybe next year. :-)