Following on from my recent post about Scala in the Enterprise, it seems that one of the points – modularity – raised a few eyebrows. It's not as if this hasn't been discussed before; but back then, that resulted in the scalax fork. Of course, modularity isn't necessary for Enterprise adoption per se; but Java is in a quandary at the moment having grown over the years with more and more bloat in one über class library, and it's been difficult finding out how to do it (though Apache Harmony is showing how it can be done).
So, what do I mean by modularity for Scala and how can we achieve it? Why, in a nutshell, is it necessary?
As Daniel said, the current
scala-library.jar is less than 3Mb, and it's only 4Mb in Scala 2.8. That doesn't sound like much, and it's not. But think back – the Archive Java products page lists Java 1.1 at 2.6Mb. Small things have a habit of growing into big things, and it's much more difficult to change when you're big than when you're small. So changing now isn't going to give immediate benefit now; but it will give benefit for the future of Scala. Perhaps by Scala 3.0, we could have a fully modular language, maybe even with its own cut-down JRE that could bootstrap it.
So what do I mean by modularity? Well, in essence, splitting apart the
scala-library.jar into multiple JARs. It used to be the case that the Actors package was a self-contained JAR in its own right; but it got folded into the main library in Scala 2.5. The plan is to undo this, as well as other logically independent units.
There's a number of things which we could immediately do in the Scala library to modularise it further. For example:
scala.actors– used to be its own JAR; let's move it back out there
scala.testing– no real reason why one would need testing to be visible/available in a run-time system; why should it be part of core?
scala.xml– initially a key part of the language, has seen its popularity slide over recent years. Many systems may not need this, so why not a separate module?
scala.concurrent– probably used by the actors package, this should really live in its own module anyway
It's worth noting that even in the current version of Scala, we already have some modules in the form of
scala-dbc.jar. So there's already an existing precedent for having these in separate units.
One of the benefits of modularising Scala at this stage is that it will make it easy to upgrade or evolve things on a module-by-module basis. One could gain improvements to the Actors code by upgrading the Actors module alone, for example; or could introduce performance improvements by updating the XML module.
One of the potential downsides of this approach is in compilation in IDEs. Not all IDEs have the ability to deal with multiple modules easily; and some Java developers (or existing Scala developers) might find it frustrating to add many modules to their classpath when building. Perhaps to counter this, we should distribute Scala as multiple interdependent modules, and then provide a utility to merge together all modules into a single monolithic
scala-library.jar for those accustomed to adding just one dependency in their classpaths. So,
scala-xml.jar could all be merged to form a customised
scala-library.jar for the benefit of existing users.
Note that throughout this, I've not said what the format of these modules should be. I don't think it's Scala's duty to necessarily pick the winner in the module race; the SimpleModuleSystem subset will be enough to be compatible with whatever module system ends up becoming the de facto standard for Java. What is essential is that these modules are versioned – so one can express dependencies on a particular version of the library (or version range), and that the modules' dependencies form a directed acyclic graph or DAG. In other words, it's OK for
scala-actors.jar to depend on
scala-concurrent.jar, but not the other way around at the same time. Ultimately, it may be possible to have a Scala concept of
Module to obtain this programmatically, in much the same way that Scala 2.8 provides a package object. However, this shouldn't be necessary in order to split up the modules initially.
We can already express dependencies in JAR files in a simple way; the Class-Path manifest header allows you to define dependent classes to a JAR. So, the
scala-actors.jar could define
Class-Path: scala-core.jar scala-concurrent.jar, for example. The key is to ensure we don't end up with loops, where
scala-concurrent.jar ends up with a reference to
scala-actors.jar, directly or indirectly. We could even ship a largely empty
Class-Path: scala-actors.jar scala-concurrent.jar ... which would then enable (run-time) equivalence from before. (Whether IDEs are smart enough to recognise that they'll need to add those dependencies based on the
Class-Path entry is debatable.)
If Maven is being used to build Scala, then even better; these just become separate top-level Maven modules. Even the SBax build process lists different modules, although the
scala-library is a bit monolithic. But however Scala is built, the ability to run Scala from the command line (i.e. outside of a runtime module system) is a key component which should be easy to achieve; after all, the current
scala executable already merges in all libraries in the
lib/ directory as part of the boot process.
Martin has said that there is interest, but that the core EFPL team has not got the resources in order to make it happen. Therefore, this is a call for those who would be interested in making Scala more modular to join in. As Martin said:
I realize there are pros and cons to modularize Scala's standard library. I am all for exploring different strategies. But I also have to accept my own limitations, both in available time and expertise in this domain. If a group of people wants to take the lead in this and make a concrete proposal, please do! It's an ideal topic for a SIP, really.
So, those that are interested in helping out with this effort (Peter Kriens has already expressed an interest), please get in contact with me so that we can put a proposal together.