As I mentioned a couple of weeks ago, last week I gave a keynote at the OSGi Community Event. During the event I unveiled the Omega problem, which is the condition whereby projects are given inexplicable and unmemorable Greek letters, which unnecessarily interferes with getting started with OSGi.
However, that wasn’t he only focus of the event. I also highlighted
P2 p2’s inefficient repository mechanism. Using figures
derived from the Eclipse 3.7.0 platform update site, I presented the following
These figures are calculated from the p2 content Jar located at http://download.eclipse.org/eclipse/updates/3.7/R-3.7-201106131736/content.jar. It’s worth noting that this file alone is 354818 bytes large (347k) and that now that 3.7.1 has been released, that adds an additional http://download.eclipse.org/eclipse/updates/3.7/R-3.7.1-201109091335/content.jar or, at 361789 bytes (353K), of extra content that has to be downloaded each time you update Eclipse.
Of those files, the majority of the data is worthless. Approximately 35 of the content (or 60%) is appendix; present but entirely useless. Only 25 of the data contains useful information.
The problems can be boiled down to:
Multiple redundant copies of the full text of the EPL license; in the 3.7.0 release, 101 copies alone (and thus, the 3.7.1 release adds another 101 copies)
Pretty printing of the XML file, when it’s supposed to be parsed programmatically
Unnecessary data describing how many child nodes a tree element has, when the tree already has that property
None of these are new problems; it was first reported back in September 2010 (and again in June 2011). So when you’re updating to Eclipse 3.7.1, the reason you have to download 700k is because of the way the update mechanism was designed.
The data was calculated by stripping the file of unnecessary whitespace (c.f. license/copyright, size), and recompressing the JAR file. Differences between the compressed (JAR) file sizes were reported in bytes. For comparison, the same content files were compressed with GZip to compare against corresponding JAR file waste, not shown above.
- Bundle data: 148666 bytes (JAR) – 143672 (GZip)
- License/Copyright text: 127388 bytes (JAR) – 127799 (GZip)
- Whitespace: 81303 bytes (JAR) – 59611 (GZip)
- Size: 4432 bytes (JAR) – 4146 (GZip)
Help me OBR, you’re my only hope
The main reason for mentioning these limitations in p2 is the fact that OBR is due out as part of next year’s Enterprise OSGi release. It is my firm hope that these issues get fixed, and that the spec mandates decent compression (i.e. using GZip instead of JAR format) combined with mandating the generation of XML files without excessive whitespace.
Anyway, you can read all about it yourself; the slides are available on SlideShare.net, or if you want, you can watch a video of the slides and me talking over them on YouTube. All of the other presentations are available, and you can find my write-up on InfoQ if you want to read more.