InfoQ has published my piece on H.264 to remain free for Internet Video, in response to the MPEG LA's Press Release yesterday. However, whilst this means that blogs (such as this one) can continue to distribute free H.264 encoded video, the license doesn't apply to browser software; so the positions of Opera and Firefox are unlikely to change.
Friday, August 27, 2010
Monday, August 23, 2010
Automatic Resource Management in Java
InfoQ has published my latest piece on Automatic Resource Management in Java, together with what that means for the Java language. From the abstract:
Part of Project Coin is the ability to deal with Automatic Resource Management, or simply ARM. The purpose is to make it easier to work with external resources which need to be disposed or closed in case of errors or successful completion of a code block. An initial implementation is now available in OpenJDK.
Monday, August 16, 2010
More details on 5966702 and pack200
Charles Nutter posted a detailed set of thoughts on the Google vs Oracle patents recently, and postulated that 5966702 was related to the Pack200 specification. He said:
This is basically the patent governing the "Pack200" compression format provided as part of the JDK and used to better-compress class file archives.
In fact, it's nothing nearly as complex as the Pack200 specification, though it might touch upon what Pack200 does.
As a way of introduction, I began writing the initial implementation of the pack200 uncompressor in May 2006, based on the final JSR 200 spec. At no time did I see or use either the RI or TCK for JSR 200; the specification is complex enough on its own. (Here's a list of my pack200 related posts.) The implementation for unpacking was completed by others after me; it's currently classlib/modules/pack200 if you want to look into it.
Having read through 5966702, it describes a way of uniquifying the constants across constant pools only, and combining them in a “mclass” archive. Roughly speaking, each .class file contains a number of Strings, which are used to represent method signatures, referenced classes, and so on. A quick Hello World class includes:
- java/lang/Object
- java/lang/System
- Ljava/io/PrintStream;
- java/io/PrintStream
- println
- (Ljava/lang/String;)V
If you download a lot of classes (whether individually, or as part of a JAR file), then this repetition can build up. The goal of 5966702 (which was filed in 1999, several years before the Pack200 spec was started) is purely to optimise the layout of these constants across multiple files, both to reduce the amount of downloaded data, but also to ensure that the constants are uniquified at runtime as well. (In essence, it's trying to build up a compile-time string interned collection of strings, and then use those globally across classes.)
I can't comment on whether the Dex implementation infringes on this patent or not; IANAL and I don't know the Dex implementation in any case. But this is just a simple approach to uniquifying strings that never made it directly into a JVM; the Pack200 specification, whilst including a uniquification of the strings also employs plenty of other optimisation methods and (for example) doesn't treat the Strings as the same in any case. Whereas 5966702 is literally explaining a uniq type operation on the set of strings, Pack200 represents a much more complex mechanism, whereby the Strings are also sorted, then represented as delta differences from the previous item, as I described in more detail previously.
Finally, it's highly unlikely that Google would be using Pack200 for their runtime requirements. Even the spec notes that there are other compression mechanisms which are better suited for loading dynamically compressed classes. From the technotes on pack200 in Java 6:
However, Pack200 archives are not designed for direct execution by any virtual machine. The greater compression level of Pack200 is won at a cost of complexity in the unpacker, a complexity incompatible with any requirement of direct loading or execution.
Any memory size calculations or layout on disk is therefore nullified by the pack200 spec itself. A packed JAR file must be reconstituted in its entirety, often with creating the JAR in memory first before being able to write out any of it.
Lastly, it's worth noting that 5966702 refers specificially to classes and in the class file format. (It even goes to lengths to include the Java Class File specification in the patent application.)
Embodiments of the invention can be better understood with reference to aspects of the class file format. Description is provided below of the Java class file format. Also, enclosed as Section A of this specification are Chapter 4, "The class File Format," and Chapter 5, "Constant Pool Resolution," of The Java Virtual Machine Specification, by Tim Lindholm and Frank Yellin, published by Addison-Wesley in September 1996, .COPYRGT.Sun Microsystems, Inc.
CHAPTER 4
The Class File Format
This chapter describes the Java Virtual Machine class file format. ...
So, in conclusion; this patent isn't really to do with Pack200 in any significant way; and it's entirely to do with how Java classes can be combined to have a pre-uniqified String constant pool, such that the constant pool is capable of being loaded straight into memory in one gulp. How this might affect Dex classes is unknown, especially since these aren't Java classes, don't use the Java class file format, and aren't combined into a single 'mclass' entry.
The Dex format does merge multiple Dex classes together and have a shared constant pool across all instances. In that regard, it's more advanced than 5966702 is, because it is also doing the referencing for variables, methods and others. (Some information about the Dex format is available at retrodev.) In some regard, it is processing Java class files as well; however, the output is not a set of Java classes – not only is the constant pool an aspect, but the instruction set is changed and translated into something else as well. In a nutshell, it's not so much a processor of Java classes as a translator into a different format. If the 5966702 patent is applicable to this as a tool for pre-processing a set of Java classes, then so too would any tool like a static checker or annotation processor which reads in sets of Java classes and generates an output.
It will be interesting to see what comes of this tussle.
Java, Oracle, Google, Dalvik and Harmony
What a week to go away on vacation. Oracle exercise their patent muscles on Java (or more specifically, the JVM) and then confirm what's been known for months; that OpenSolaris is now NoLongerOpenSolaris. This doesn't bode well for MacZFS ...
First though, there seems to be a lot of confusion on the Google vs Oracle litigation. A lot of people are claiming that this impacts the Java language – but that's not strictly true. The community is already affected, so there's been some collateral damage, but what Oracle has done hasn't affected the language itself; rather, the issue is about items relating to the JVM.
The second thing is that there appears to be confusion between the Apache Harmony project, and Google's use of their class libraries, versus the VM that underpins it. Apache Harmony is a set of clean-room implemented class libraries that implement features like java.lang.String and java.util.HashMap. In fact, the Harmony libraries can run on a number of different VMs, including IBM's experimental ones. The JVM that actually gets bundled with Harmony downloads is DRLVM – so when people refer to the Harmony JVM, they're really talking about DRLVM. It doesn't prevent others using the Harmony class libraries for their own VMs (e.g. VMKit) if they wanted to.
In fact, this is what Google did with Dalvik and their Dalvik VM. They wrote a new VM engine, and used the Harmony libraries to provide a similar language and runtime environment, all on a completely new VM. It doesn't even run bytecode; the Dalvik Executable is translated from the bytecode that's generated by compilers and the like.
The important thing is this has nothing to do with OpenJDK, or GPL. Google hasn't been near them, and to the best of my knowledge, hasn't contributed back any source upstream to Apache Harmony yet either. They're clients of an open-source library; and the patents don't cover that.
What Oracle has done, however, is attempted to exercise some patents that are used by the Dalvik VM. This should be a concern by others (e.g. Apache Harmony DRLVM, VMKit) because there's nothing stopping Oracle moving foward with those later. But the real money is in the mobile space, which is where the open dispute about Java came from.
I suspect at least some of the patents in question will be invalidated in the coming years, though I suspect enough will remain to result in an out-of-court settlement in the future. But the key problem here is that whether Java is open-source or not, Oracle will threaten its patent sabre in the future again.
As for communities; Oracle can't put a dollar price on them, so equates them to zero. The JCP was pretty much dead in the water even before the takeover was complete; and it's only really the JCP that put the Java name on the compiled bits. I suspect Oracle will disband the JCP and simply call the next release Java 7, whatever anyone else thinks.
Similarly, the OpenSolaris community has been shot in the foot with the long-expected, but now confirmed, announcement that there is no more OpenSolaris. What this means for ZFS is less clear; whilst the bits will be available, the number of people using it outside of a supported contract will vaporise and so all of the future testing on the ZFS bits will disappear, other than what is expected by the internal testing teams. That doesn't give me confidence that ZFS will continue to be rock solid, particularly as lead engineers are disappearing from Oracle at a higher than usual rate for the valley. The last OpenSolaris build that was available was 134, and I'm silently expecting that ZFS will become less and less useful after that point. Already, deduplication was a work in progress that people had raised many concerns about; and now Oracle have confirmed that dedup will be in the commercial Oracle Solaris 11. I predict larger problems for Oracle as times go ahead.
As for my work on MacZFS, there's certainly a few things to get done; but the evaporating upstream community, combined with the menacing patent troll outlook means that I don't see ZFS as the future of filing systems any more. Unfortunately, there's nothing to replace it yet; though some are cheering Btrfs, that's a project that is going from stillborn to stillborn status, with a btrfs is broken by design critique.
Still, innovations don't tend to happen when all is going well; they tend to occur when prompted by stress (in the environmental sense). I predict that a new ZFS replacement will emerge in the next decade, outside of Oracle (and perhaps by the disaffected engineers who originally designed it and learnt the issues associated with it), and a new Java-like language and runtime. Whether that's Google Go (which has finally grown up and got exceptions, although they call them panic and recover rather than try and catch, presumably to savour some of that eggy goodness all over their face).
The reality is that the JVM got left behind some time ago by Microsoft's CLR, particularly with its multiple language support (well, except for IronRuby which has rusted). The lambda-dev project was always going to be a case of a little too late, and the dynamic method handles are going to be great; but the re-rolling Jigsaw was always going to be contentious and not useful to boot.
Could this herald the emergence of a new language and runtime to surpass Java? Only time will tell, but Oracle's actions this week may well be the point where computer historians identify the beginning of the end of Java.
Friday, August 06, 2010
Eclipse 4.0
InfoQ has just published my piece on the release of Eclipse 4.0, and what it means for the future of the Eclipse platform.
Thursday, August 05, 2010
Using DTrace to find out callstack
Mac OSX has supported dtrace since 10.5, and it's pretty powerful stuff (as long as you're not running iTunes or QuickTime). You need to be root in order to be able to run it, but there's plenty of sample scripts in /usr/bin which end in .d that you can look at.
As part of trying to add Spotlight support to ZFS, I needed to check whether the add_fsevent was being called for ZFS mounted. Fortunately, it's part of the kernel and exported by the fbt provider, so we can find out when and where it's running. This gives us the opportunity to use the stack() primitive to generate the stack of what's being called:
$ sudo dtrace -n fbt::add_fsevent:entry'{stack()}'
CPU ID FUNCTION:NAME
0 6985 add_fsevent:entry
mach_kernel`vn_open_auth+0x256
mach_kernel`link+0x638
mach_kernel`open_nocancel+0xf3
mach_kernel`unix_syscall64+0x269
mach_kernel`lo64_unix_scall+0x4d
0 6985 add_fsevent:entry
mach_kernel`vn_close+0x5c
mach_kernel`vn_close+0x185
mach_kernel`closef_locked+0x149
mach_kernel`fdrelse+0x13d
mach_kernel`close_nocancel+0x8d
mach_kernel`unix_syscall64+0x269
mach_kernel`lo64_unix_scall+0x4d
1 6985 add_fsevent:entry
mach_kernel`vn_open_auth+0x256
mach_kernel`link+0x638
mach_kernel`open_nocancel+0xf3
mach_kernel`unix_syscall64+0x269
mach_kernel`lo64_unix_scall+0x4d
1 6985 add_fsevent:entry
mach_kernel`vn_close+0x5c
mach_kernel`vn_close+0x185
mach_kernel`closef_locked+0x149
mach_kernel`fdrelse+0x13d
mach_kernel`close_nocancel+0x8d
mach_kernel`unix_syscall64+0x269
mach_kernel`lo64_unix_scall+0x4d
So we're definitely getting reports in for items in the add_fsevent coming through. Now, to try and hook up DTrace to the MacZFS implementation ...