Monday, November 29, 2010

Moving data from one ZFS pool to another on OSX

References

If you've got a bunch of data on one ZFS system, and you want to move it to another system, then there's a few ways this can be accomplished. The easiest is probably to use a mirrored ZFS drive, since you can add a new mirror, take a snapshot of the data, then move it off elsewhere.

This doesn't work if (a) you don't have a mirror to start with, or (b) the destination you're moving to isn't same sized or bigger. Even if you've used less space than is available, ZFS mirrors operate on the size of the minimum disk in a set, and can't grow smaller again.

Fortunately, zfs send and zfs receive are really handy for moving the contents of one (or more) snapshots between systems. Unfortunately, it takes a bit more work on OSX because (a) we've not got to the point where recursive snapshot sending works, and (b) the zfs command can't deal with streams.

You can work around this with a set of steps, however. Let's say we have source and target pools on different systems. We can do:

  • target# mkfifo /var/tmp/in
  • target# nc -l 1234 > /var/tmp/in
  • target# zfs receive -F TargetPool/Filesystem@SomeSnapshot < /var/tmp/in
  • source# mkfifo /var/tmp/out
  • source# cat var/tmp/out | nc target 1234
  • source# zfs send SourcePool/Filesystem@SomeSnapshot > /var/tmp/out

This should send a single snapshot from the source system to the target system. If you have multiple snapshots, you can send them incrementally:

  • source$ zfs send -i SourcePool/Filesystem@SomeSnapshot SourcePool/Filesystem@NextSnapshot > /var/tmp/out

The only constraint is that the target system has to have the @SomeSnapshot and that @NextSnapshot is ahead of @SomeSnapshot.

Finally, note that this can be used to perform incremental backups as well as once-off sending of data, so it's good to know how to use them. Also note that netcat (nc) doesn't encrypt any data, so there's an assumption that this is over a trusted network (if not, use SSH to wrap). Finally, this assumes running as root since that will be needed for the send/receive parts to work.

Thursday, November 25, 2010

Less than 100 days left of IPv4 addresses

References

I've just written up a piece on InfoQ about the fact that we now have less than 100 days left of IPv4 address availability.

If you haven't used IPv6 yet, it's worth familiarising yourself with the concepts and the terms; tools like ping6 and host handle IPv6; other tools like ssh support either natively (or can be selected with the -4 or -6 switches).

$ host ipv6.google.com
ipv6.google.com is an alias for ipv6.l.google.com.
ipv6.l.google.com has IPv6 address 2a00:1450:8006::93
$ ping6 ipv6.google.com
PING6(56=40+8+8 bytes) 2axx:xxxx::xxxx:xxff:fexx:xxxx --> 2a00:1450:8006::93
16 bytes from 2a00:1450:8006::93, icmp_seq=0 hlim=56 time=36.776 ms
16 bytes from 2a00:1450:8006::93, icmp_seq=1 hlim=56 time=36.916 ms
apple:support alex$ telnet -6 ipv6.google.com 80
Trying 2a00:1450:8006::93...
Connected to ipv6.l.google.com.
Escape character is '^]'.
GET / HTTP/1.0

HTTP/1.0 302 Found
Location: http://www.google.co.uk/
...

Obviously in this example I've obfuscated my IPv6 address; but typically you'll see the ff:fe in the middle. That's because the lower parts of the address are made up of six bytes of the computer's MAC address, with 'fffe' in the middle. So if your MAC address is a1:b2:c3:d4:e5:f6, then the auto-discovery address will likely be something ending in :81b2:c3ff:fed5:e5f6. (Note that the second least significant bit in the first digit is flipped.) If that computer was on Google's network, it would get 2a00:1450:8006::81b2:c3ff:fed5:e5f6 as an address (the 2a00:1450:8006 is their routing prefix, which they probably own all of underneath giving Google a much wider networking address on its own than the entire internet IPv4 space).

This may not work for you – if you don't have an IPv6 address, you won't be able to ping or telnet into Google's webserver (on port 80!). However, if your OS supports IPv6 then you will be able to ping6 ::1, which is the loopback address as well as any fe80:: address, which will also use the MAC address (e.g. fe80::81b2:c3ff:fed5:e5f6).

Security

Whilst IPv6 makes IPSec a mandatory supported part of the protocol (it's optional in IPv4), there are also some security considerations. For one, the firewalls may well be configured to prevent access to a given port; but there's an entirely different firewall for IPv6 which may be wide open. Also, unlike local IPv4 addresses behind a NAT, which can't be directly accessed, the IPv6 address can be globally addressed from anywhere.

Fortunately, the address space is so much wider in IPv6 that scans for open ports are likely to be impractical for the next decade (though this is really security through obscurity). In addition, the automatically generated addresses may eschew the MAC address and use a random number each time; though this will cause a client to potentially change its address between restarts. (This may be useful for privacy in an ISP who would otherwise be able to track source usage; the detail is in RFC3041.)

Conclusion

The IPv4 address space is almost exhausted, and IPv6 is the only real solution for the future. However, the full IPv6 transition is still some time away; and time is running out.

ObjectivEClipse (briefly) reappears

References

I was informed last night that Sony have a new Objective-C based development platform on Linux that uses CDT and ObjectivEClipse. Looking at the developer programme guide, it seems that the same names and similar versions are used (org.eclipse.cdt.objc.core and org.eclipse.cdt.objc.ui, both at 0.21 when the last public release was 0.20). The icon is also one I put together and includes the non-capitalised “project” (all other Eclipse projects use Project).

However, I've not been able to verify this, since when you go to http://snap.sonydeveloper.com/ you're presented with a message “SNAP development is currently on hold”. Ooops. Perhaps they read Doug's tweet suggesting that they read the “no longer active development” flag on ObjectivEClipse's project page.

You can find more information about it by searching Google's caches, for example, http://snap.sonydeveloper.com/develop/platform/ will tell you what the front page used to tell you yesterday; but since going viral, the project has been put on hold. Maybe it was announced too early; maybe someone wanted to downplay the GNU associations. Still, you can see the effort put into documentation for Sony SNAP as well.

Either way, I'm thrilled that there is increasing interest in Objective-C generally, and specifically on the Eclipse platform. Whilst ObjectivEClipse failed to take off on its own as an external project, the hope lies eternal that one day, maybe we can make Objective-C a fully supported language directly inside CDT itself.

Wednesday, November 24, 2010

Delving into Spring Roo

References

I had to pick up JPA recently and since it's been a while since I did anything in this space, thought I would take the opportunity to find out more about Spring Roo. I first heard about it at QCon London 2010, and at the time recalled thinking how easy it was to put together an application from simple components.

One of the things that also encouraged me was the generated application was also a valid OSGi bundle; great. So when Roo 1.1 was released recently, I gave it a spin. (Sadly, and somewhat ironically, whilst Roo 1.1 is now running in an OSGi container, the generated products are no longer OSGi bundles. Roo-1052 has more information about this slide backwards.)

Before going into how to use Spring Roo, it's worth mentioning what it does and why. The goal of Roo is to make it easy to put together web-based applications that have persistent entities; in other words, your fairly typical web/servlet/jsp/hibernate kind of setup. However, the way it achieves this is somewhat different to other tools. Instead of modifying source code that you write, it creates additional files which it generates. It then uses – effectively – a #include at compile time to bring those fragments into your class. This is achieved with the AspectJ compiler, and in a unique and fairly useful way, using AspectJ as more than just wrapping-a-method-with-log-statements.

The net result is you write a minimal set of code in your class, and Roo automatically generates the necessary AspectJ fragments to support what you've written. (Those who have seen Project Lombok may already have seen this kind of approach in an IDE.)

Using Spring Roo is fairly easy. There's a roo.sh executable (roo.exe on Windows) which brings up a roo> prompt, from which you can hit a number of commands. Unlike a standard OSGi shell, there's TAB completion; also a bonus is the fact that commands which aren't enabled are hidden from the list. Finally, there's a hint system that you can call (with hint) to bring up a list of what-you-can-do.

The steps are as follows:

  • Create a project
  • Setup the persistence mechanism (database, mapping provider)
  • Create one (or more) entities
  • (Optionally) Create a web container

Project

Creating a project is easy enough – you need to run project --topLevelPackage com.example. It generates a number of source directories, a spring folder and a log4j properties file, along with a Maven project to build them all. Think of it as a one-liner for Maven archetypes and you'll be close to what's required.

roo> project --topLevelPackage com.example
Created /tmp/example/pom.xml
Created SRC_MAIN_JAVA
Created SRC_MAIN_RESOURCES
Created SRC_TEST_JAVA
Created SRC_TEST_RESOURCES
Created SRC_MAIN_WEBAPP
Created SRC_MAIN_RESOURCES/META-INF/spring
Created SRC_MAIN_RESOURCES/META-INF/spring/applicationContext.xml
Created SRC_MAIN_RESOURCES/log4j.properties

Persistence

Setting up persistence is also as easy. persistence setup --provider p --database db will do all the legwork for you in terms of setting up a persistence.xml file, the driver classes to use and the per-provider magic settings that you need to know. Providers such as EclipseLink, OpenJPA and Hibernate are available, as are database drivers like DB2, Derby, Oracle and so on. (The setup notes that if you use a commercial driver you have to install it manually into your Maven repository in order for it to be found.) Use the TAB key to get a list, or complete what you're typing, in doing this.

com.example roo> persistence setup --provider ECLIPSELINK  --database DERBY 
Managed SRC_MAIN_RESOURCES/META-INF/spring/applicationContext.xml
Created SRC_MAIN_RESOURCES/META-INF/persistence.xml
Created SRC_MAIN_RESOURCES/META-INF/spring/database.properties
Managed ROOT/pom.xml [Added dependency org.apache.derby:derby:10.6.1.0]
Managed ROOT/pom.xml [Added dependency org.eclipse.persistence:eclipselink:2.1.0]
Managed ROOT/pom.xml [Added dependency org.eclipse.persistence:javax.persistence:2.0.1]
Managed ROOT/pom.xml [Added dependency org.hibernate:hibernate-validator:4.1.0.Final]
Managed ROOT/pom.xml [Added dependency javax.validation:validation-api:1.0.0.GA]
Managed ROOT/pom.xml [Added dependency cglib:cglib-nodep:2.2]
Managed ROOT/pom.xml [Added dependency javax.transaction:jta:1.1]
Managed ROOT/pom.xml [Added dependency org.springframework:spring-jdbc:${spring.version}]
Managed ROOT/pom.xml [Added dependency org.springframework:spring-orm:${spring.version}]
Managed ROOT/pom.xml [Added dependency commons-pool:commons-pool:1.5.4]
Managed ROOT/pom.xml [Added dependency commons-dbcp:commons-dbcp:1.3]
Managed ROOT/pom.xml

Entities

Now we're able to create entities. The entity --class c kicks off the entity generation process; you can type in a fully qualified name or use ~ to represent the project package.

com.example roo> entity --class ~.Employee 
Created SRC_MAIN_JAVA/com/example
Created SRC_MAIN_JAVA/com/example/Employee.java
Created SRC_MAIN_JAVA/com/example/Employee_Roo_Configurable.aj
Created SRC_MAIN_JAVA/com/example/Employee_Roo_Entity.aj
Created SRC_MAIN_JAVA/com/example/Employee_Roo_ToString.aj

At this point, we have a project that's capable of being used as a pure JPA provider. Before going further, it's worth looking at what's been generated:

  • pom.xml - the project details, source directories and so on
  • persistence.xml - which defines the database, configuration mappings and so on
  • *.properties - for supplying mutable data, like userid, password and logging levels
  • *.aj - the aspect fragments
  • Employee.java - a single Java file

The Employee is fairly simple; at the moment, we've not added anything to it. In fact, other than a few annotations (@RooJavaBean, @RooToString, @RooEntity) it has no contents. We can add a field using the console to give an employee a name and a manager.

~.Employee roo> field string --fieldName name 
Managed SRC_MAIN_JAVA/com/example/Employee.java
Created SRC_MAIN_JAVA/com/example/Employee_Roo_JavaBean.aj
Managed SRC_MAIN_JAVA/com/example/Employee_Roo_ToString.aj
~.Employee roo> field reference --fieldName manager --type ~.Employee
Managed SRC_MAIN_JAVA/com/example/Employee.java
Managed SRC_MAIN_JAVA/com/example/Employee_Roo_JavaBean.aj
Managed SRC_MAIN_JAVA/com/example/Employee_Roo_ToString.aj

All this has done is added a couple of fields to the class, one with a single JPA annotation:

@RooJavaBean
@RooToString
@RooEntity
public class Employee {
    private String name;
    @ManyToOne
    private Employee manager;
}

The interesting thing is what you don't see, which are being generated in the *.aj files. For example, take a look at the Employee_Roo_ToString.aj file:

privileged aspect Employee_Roo_ToString {
    public String Employee.toString() {
        StringBuilder sb = new StringBuilder();
        sb.append("Id: ").append(getId()).append(", ");
        sb.append("Version: ").append(getVersion()).append(", ");
        sb.append("Name: ").append(getName()).append(", ");
        sb.append("Manager: ").append(getManager());
        return sb.toString();
    }
}

This aspect, when compiled with AspectJ, effectively inserts a toString() into your class automatically. It uses the name and manager that we've just added; and in fact, if you were to go into the Employee.java file and edit it with your favourite text editor, when you save it, Roo will notice and update the toString() automatically.

The id and version are defined in the Employee_Roo_Entity.aj file, so all Roo entities have these. IDs are represented with a Long by default (though you can change that when the entity is created) as well as an Integer version (to support verification of recency when two updates occur on the same row in the database).

Finally, we've also got Employee_Roo_JavaBean.aj which generates the getName() and setName() methods based on the fields you add. Of course, most IDEs can do this for you but Roo takes it out of your IDE and into a standalone monitoring process which can update the aspects automatically.

If you have a need to do a specific implementation – say, to pre-validate a field – then you can simply write the setName() in your Employee.java file. Roo notices that it's already there and doesn't bother regenerating it.

There's a web controller as well, though I'm not going to go into it here; doing controller all will generate a huge amount of code for you, and perform command --mavenCommand jetty:run will bootstrap a Jetty environment to test it.

Since it's scriptable, you can easily replay this by copying and pasting the below into a Roo shell prompt:

project --topLevelPackage com.example
persistence setup --provider ECLIPSELINK --database DERBY
entity --class ~.Employee
field string --fieldName name
field reference --fieldName manager --type Employee
controller all --package ~.web
perform command --mavenCommand jetty:run

Once you've done that, point your web browser at http://localhost:8080/example/ and see the app in action.

What's not to like?

Simple project setup, integration with a variety of different providers and databases, single command entry – what could be better?

Well, although it's an excellent idea in principle, it does have its drawbacks. One is that although using aspects in this way is a great idea, there is still some concern with using aspects which may put some people off. The second is that sometimes it's not clear what is happening behind the covers which takes a bit of getting used to.

It's a real shame that Roo has stopped generating OSGi bundles by default. Partially the enterprise spec runtimes aren't there yet, so generating a Meta-Persistence header may not be immediately usable; but more was the complaints against Roo that it was using the SpringSource EBR instead of Maven central. Although EBR has been great for popularising OSGi, many upstream projects are migrating over to use OSGi metadata so it's not clear immediately that it's as much of an issue as it was when Roo originally came out.

The license is also a potential sticking point as well. Although the Roo annotations are licensed under an Apache License, Roo itself (which reads and generates the *.aj files) are GPL. Some are concerned that this might introduce viral dependencies on the GPL – though whether from some kind of indirect linkage with the annotations or whether it's a bigger concern about the code that is automatically generated by Roo differs depending on who you ask. Either way, it's something to consider when basing an application off Roo.

Conclusion

Spring Roo is a great environment for creating persistent entities and web applications to drive them. Obviously not all applications fit into this shape so it has a limited audience. Being a Spring project, it uses Spring heavily in the generation of the runtime entities; much more so than you need if you're generating an OSGi JPA bundle. It's also a great application of aspect-oriented-programming that is more than the log-it-and-see example that you come across.

However, concerns about the license and the fact that it no longer generates OSGi bundles mean that it's probably better suited to prototyping or learning about how to generate JPA entities. The approach is promising though and with a different license or different style of generation it could be a real winner.

Monday, November 22, 2010

Automatically tagging builds with Git describe

References

As part of looking after MacZFS, I had to come up with a standard way of representing version numbers of builds. The prior Apple builds, like most Apple products, were sequentially increasing numbers based on some arbitrary reference point in the past. As a result, there was the “102” build (the first initial drop) and the “119” build, the last publicly released version. (There was a “286” build but this was never publicly released as source or binary.)

Since that doesn't tie well with the state of a distributed version control system (where there is no concept of a single anything), I had to invent a new versioning scheme. I chose to base it on the revisions of OpenSolaris (as was), under the onnv tags in the mercurial repository. These too, are centrally numbered releases like onnv_72, onnv_73 etc. but at least they're upstream and which we need to build upon.

As a result, I tag merged copies in GitHub with maczfs_72, maczfs_73 etc. Sometimes this works first off (or I can use a git commit --amend) and sometimes it needs to go through a few subsequent revisions.

Fortunately, I can create a unique point to reference this with git describe. If you run this on any git project, it gives you a tag, plus the number of commits since that tag, and finally the hash (commit id) so you can uniquely pinpoint where you are.

$ git describe
maczfs_74-5-g169d02a

So I'm 5 commits ahead of the maczfs_74 branch, at 169d02a. (The g refers to the fact that it's a git repository, which may help disambiguate for those using other DVCS.)

If I need to re-release something (e.g. based off the maczfs_72 merge point, I can do so by releasing a maczfs_72-1 branch instead. Now the git describe will look like maczfs_72-1-g72c0e09.

From this information, I can derive what version of onnv I'm sync'd against, and how far off the tag I am. This allows me to produce binaries like MacZFS-72.1.7 and MacZFS-74.0.0 and still know where they came from.

I also use this information to post-process the version tag that gets compiled into the zfs executable. First, I get git to tell me where we are, then I use a convoluted AWK script to generate the version numbering text. This is then finally wrapped up in shell invocation in project.pbxproj which uses PListBuddy to write the entries into the Info.plist of the kernel extension. Here's what it looks like:

DESCRIPTION=`git describe --tags --long --match 'maczfs*' 2> /dev/null`
# From AWK script 
/maczfs_/ {
        ONNV = $2
        if (NF<5) {
                REL = 0
                COMMIT = max($3,99)
        } else {
                REL = max($3,99)
                COMMIT = max($4,99)
        }
        print ONNV "." REL "." COMMIT
}

The result is that you can find out where you are by version number using kextstat:

$ kextstat | grep zfs
   98    0 0x1a90000  0x79000    0x78000    com.bandlem.mac.zfs.fs (74.0.0) <7 5 4 1>

From this, we know that the corresponding source that created it is maczfs_74. More information about the arguments are available in the git-describe man page.

Thursday, November 11, 2010

Restlet

References

Restlet is a massively under-used library for serving REST based requests over the web. It's fully OSGi compatible (unlike Jersey, which uses a bunch of com.sun stuff under the covers) and has recently released version 2.0.

I recently posted a cryptic tweet in response to @Vogella's article on REST with Jersey, since Jersey has repeatedly failed in an OSGi runtime like environment. Using Restlet is a much smarter move if you want to serve REST based requests.

Here's the Hello World of Restlet:

import org.restlet.*;
import org.restlet.data.*;
import org.restlet.resource.*;
public class Test {  
  public static void main(String[] args) throws Exception { 
    int port = 8080; 
    new Server(Protocol.HTTP, port, HelloResource.class).start();  
  }
  public static class HelloResource extends ServerResource {
    @Get 
    public String helloWorld() { 
      return "Hello World";
    }
  }
}

Run that, with restlet-2.0.1.jar on your classpath, and then point your browser to http://localhost:8080.

There's a lot more documentation in the Restlet tutorial if you want to know more.

Friday, November 05, 2010

Using EMF for OSGi service creation

References

I have never really understood the benefits of model-driven development, and although I have looked at EMF and friends (briefly) in the past, I've never really found it useful. Like other styles of programming, model-driven development is useful for solving certain types of needs; but you can get away without it quite easily. The same can also be said of dependency injection, modularisation, test-driven development and so forth; each of those provide advantages which aren't immediately apparent to the novice user.

So I figured I'd give EMF another spin to see if I could use EMF for OSGi service creation.

Models, diagrams, genmodels - oh my!

The first hurdle to cross is understanding the different kinds of files in an EMF project, and to reason why they're all needed.

  • ecore - stores the model representation itself
  • ecorediag - used if you want to create a graphical representation of the models (far easier than the point-and-click editor)
  • genmodel - configuration for determining how the ECore file is translated into Java source code

Why do you need so many individual files for representing and generating the models? Well, one argument is that they're each specifically suited to one type of job; the ECore could be used to drive other types of code generation (into C++, say) and the diagram makes it easier to see what's happening (but not strictly necessary). The GenModel contains configuration options that are used by the generation process to customise the output.

This won't be a full tutorial on EMF; for that, have a look at Lars Vogel's excellent tutorials on EMF and others.

Out of the box, EMF generates classes that use a separate interface and implementation classes. That's good, but it uses a few oddities which may not be to everyone's tastes.

  • The interface extends EObject
  • The implementation class is called ...Impl
  • The implementation class is put in the impl package

None of these are show-stoppers, but they are different from the standard Eclipse mechanisms of using an I prefix for interfaces, and using an unadorned name for the class.

Not like that, like this...

Fortunately, some of these can be addressed with customisations made to the genmodel file. For example, the names of the interface and class, as well as the package, can be adjusted by making the following modifications to the genmodelsource:

<genmodel:GenModel interfaceNamePattern="I{0}" classNamePattern="{0}" 
rootExtendsInterface=""...>
  <genPackages classPackageSuffix="internal" ...>

It's possible to change this in the drop-down properties list as well, by selecting the Interface Name Pattern and Class Name Pattern in the “Model” section, and the Implementation in the “Package names” section of the package(s) on the genmodel.

Having customised this, it's now necessary to remove traces of the EMF from the public interface. By default, it will set up the interface with a EObject parent. In a number of cases, this isn't desirable, since you may not want to expose the fact that EMF is behind the implementation. Fortunately, you can change the Root Extends Interface in the “Model Class Defaults” of the properties. Changing it to empty (which gives you rootExtendsInterface="" in the genmodel) gives you plain interfaces for the objects in question.

The next problem is the factory. This is used for creation of the concrete nodes, much like the DocumentBuilderFactory works for XML documents in Java. By default, it extends EFactory, an EMF-specific interface, which again leaks implementation details out. This can be removed by setting the “Supress EMF Metadata” option, or by doing suppressEMFMetaData="true" in the genmodel.

Hitting the brick wall

We're now at a state where the generated interfaces are almost completely EMF free. We have a factory and a type which are both specified with “pure” interfaces.

However, any abstract factory needs a way of acquiring the factory in the first place. Such factory factories exist (hello DocumentBuilderFactory), but there are many ways of acquiring this. A property set on the JVM, using injection in Spring, or even an OSGi service lookup (coupled with a declarative services registration).

EMF, on the other hand, takes an in-your-face approach to providing the factory factory by virtue of a public static final field to an internal class. Although this doesn't use EMF visibly from the outside, the class is still in the internal package (which shouldn't be exported) and leaks the EMF inheritance via many of the methods on that type.

The documentation notes that it's possible to generate API with no EMF dependencies but the existence of this field directly contradicts that. Worse, this field is on the interface of the factory, not even a class. Now whilst it's possible to have multiple factories (or additional ones to the first), the 'default' one is used in many places and is not going to change (WONTFIX).

As if that wasn't enough, the generated bundle that EMF spits out has Bundle-SymbolicName: ..singleton:=true which prevents multiple instances being installed at the same time. There's a lot of software that doesn't have that constraint – the Eclipse UI is mostly littered with singletons to prevent multiple options being available – but for generic OSGi services, that's not acceptable. And, as if that wasn't enough, it leaks out a dependency on EMF even when it doesn't need to:

Bundle-SymbolicName: ..singleton:=true
Require-Bundle: org.eclipse.core.runtime,
 org.eclipse.emf.ecore;visibility:=reexport

Ultimately, if there were configurable ways of providing the factory – like registering an OSGi service, or providing a class method instead of a static variable which could do service lookups, then the approach might be more useful. But as it is, there's one too many OSGi anti-patterns in the generated code to make using EMF for OSGi services practical for general use.

But what about E4?

Doesn't E4 suffer from these problems? It uses EMF all over the place. Well, yes, it does. There's a couple of options you can take if you're wanting to use EMF in this way:

  • Live with the EMF exports and leaks, and turn everything into models
  • Manually remove the leaks each time you regenerate the source
  • Don't use EMF for representing a separate interface/model dichotomy and write your own interfaces

As it happens, most of the Eclipse workbench is driven by singletons (there's only one workbench, for example – something that's caused problems for RAP style solutions. So the fact that now, in the OSGi runtime, there's only one implementation available in the runtime isn't a significant downside for those bundles wishing to use it. In any case, most of the UI is the same.

This is of course one reason why the dreaded “Restart Eclipse now” messages gets popped up whenever you install something into Eclipse – you just can't update singletons without taking them and everything they depend on down and back up again. (There's also some issues with native code dependencies, so hybrid bundles like SWT may not be updatable in an OSGi runtime in any case.)

But for developing well-behaved OSGi services, you really don't want to be caught by the same kind of self-imposed restrictions that led to the state of the world in Eclipse. Yes, it might work for them – but even in the E4 world, with its new service-based instead of singleton-based APIs still suffers from the same singleton aspect.

So, is MDD or EMF bad?

Well, neither really. Model driven design is a way of amplifying small models (text or otherwise) into vast quantities of generated code. I'll bet you use this all the time, in fact. Hibernate is an example – given a model (hibernate configuration file), it generates a set of files (DDL, SQL and the like) based on options you specify. For Oracle, it'll generate one set of files; for DB2, another. With other DB access tools, you can even use it to generate the DDL which you can save to files.

Model driven design is far more popular as a runtime aspect than a compile-time aspect, though. And whilst it's possible to use EMF as a run-time model, most of the uses and examples you'll find are at the compile-time level. A key distinction is that it's usually possible to introduce models at runtime without having to change the build process, whereas for models at compile time you either need generated code checked in or a modification to the build process to run the model generation.

And let's not forget that EMF is pretty powerful at generating EMF models. Many don't like the verbosity of XML that EMF uses to store that information; in fact, XML permeates the entire EMF infrastructure (which is why models need to have a namespace and namespace URI – the EMF model editor doesn't complain if they're not there, but it will cause problems for code generation if they're missing). There's also a lot of projects using EMF (though the old adage that 90% of them are automatically generated usually comes up), so that many people can't be wrong.

The real problem is that there's a difference between “Generate Java code” and “Generate EMF models”. Many get frustrated at EMF, not because EMF isn't powerful, but because by default it generates EMF models rather than models you want it to generate. And that keeps putting people off using EMF.

It's possible to change EMF – after all, the templates are available in source form (see the templates/model directory in org.eclipse.emf.codegen.ecore_*.jar for more). And there's a project called Acceleo which allows arbitrary model-to-something translations. Sadly, there's very little documentation available on the Eclipse Wiki. You'd be hard pushed to think it wasn't a vapourware product – except that it's been around for a few years at their old website which explain it in a bit more detail. Update: There is documentation in the help pages at http://help.eclipse.org.

(Incidentally, this does seem to be a trend for modelling projects at Eclipse; there's about as much documentation as their is model definitions. Pity there's no way of automatically generating documentation from models :-)

Summary

EMF is seductively simple in terms of being able to generate a model and then use that to generate a number of classes. But the generated output is really just another EMF model; don't try and pretend it's something else. Using EMF as an OSGi service is not a practical solution for general cases; Eclipse's workbench of existing singletons is such a special case.

However, if you can find the right model generation – which after all, exists today in the form of New Project wizards, maven archetype generation and the like – then having a model-driven design becomes more practical. Additionally, model-based approach allows you to regenerate after the initial generation (something you can't do for archetype or new projects easily).

Thursday, November 04, 2010

Google mod_pagespeed for Eclipse.org?

References

Google have just released mod_pagespeed as an optimising caching module for Apache. It works by automatically minifying the content, much like existing minifiers work today but on any generated output rather than the source documents. As such, it's applicable for speeding up other types of site, like wikis, mailing list or newsgroup pages, generated CVS pages and the like.

I wrote up an overview over on InfoQ on the optimisations that it does, including:

  • Pre-shrinking images to the sizes specified in the HTML (if any)
  • Embedding images as data URLs for frequently used images
  • Pulling CSS into the HEAD of HTML pages
  • Merging multiple CSS files into one
  • And of course, standard minification like removal of comments, whitespace and the like

I wonder if Eclipse.org could benefit from installing and using this module? There's certainly a lot of extraneous HTML whitespace on even the main page (and a number of others that are automatically generated from PHP and the like) which would result in smaller files, and thus faster delivery. There's also 29 images on the main page, some of which could be replaced with the data URLs, meaning less round-trips for the client-server communication.

However, it's released as binary-only (but not packaged in a particular format) for Linux x86/x86_64 Update: source is available for compilation. That means it may not be suitable as there's a variety of hardware that's used (as Denis keeps showing off :). It's also the kind of thing that you might not want to enable site-wide, at least initially.

Now, if only it could work on P2 repositories ...