Friday, August 28, 2009

Modularity for Scala?

References

Following on from my recent post about Scala in the Enterprise, it seems that one of the points – modularity – raised a few eyebrows. It's not as if this hasn't been discussed before; but back then, that resulted in the scalax fork. Of course, modularity isn't necessary for Enterprise adoption per se; but Java is in a quandary at the moment having grown over the years with more and more bloat in one über class library, and it's been difficult finding out how to do it (though Apache Harmony is showing how it can be done).

So, what do I mean by modularity for Scala and how can we achieve it? Why, in a nutshell, is it necessary?

As Daniel said, the current scala-library.jar is less than 3Mb, and it's only 4Mb in Scala 2.8. That doesn't sound like much, and it's not. But think back – the Archive Java products page lists Java 1.1 at 2.6Mb. Small things have a habit of growing into big things, and it's much more difficult to change when you're big than when you're small. So changing now isn't going to give immediate benefit now; but it will give benefit for the future of Scala. Perhaps by Scala 3.0, we could have a fully modular language, maybe even with its own cut-down JRE that could bootstrap it.

So what do I mean by modularity? Well, in essence, splitting apart the scala-library.jar into multiple JARs. It used to be the case that the Actors package was a self-contained JAR in its own right; but it got folded into the main library in Scala 2.5. The plan is to undo this, as well as other logically independent units.

There's a number of things which we could immediately do in the Scala library to modularise it further. For example:

  • scala.actors – used to be its own JAR; let's move it back out there
  • scala.testing – no real reason why one would need testing to be visible/available in a run-time system; why should it be part of core?
  • scala.xml – initially a key part of the language, has seen its popularity slide over recent years. Many systems may not need this, so why not a separate module?
  • scala.concurrent – probably used by the actors package, this should really live in its own module anyway

It's worth noting that even in the current version of Scala, we already have some modules in the form of scala-swing.jar, scala-compiler.jar and scala-dbc.jar. So there's already an existing precedent for having these in separate units.

One of the benefits of modularising Scala at this stage is that it will make it easy to upgrade or evolve things on a module-by-module basis. One could gain improvements to the Actors code by upgrading the Actors module alone, for example; or could introduce performance improvements by updating the XML module.

One of the potential downsides of this approach is in compilation in IDEs. Not all IDEs have the ability to deal with multiple modules easily; and some Java developers (or existing Scala developers) might find it frustrating to add many modules to their classpath when building. Perhaps to counter this, we should distribute Scala as multiple interdependent modules, and then provide a utility to merge together all modules into a single monolithic scala-library.jar for those accustomed to adding just one dependency in their classpaths. So, scala-core.jar, scala-actors.jar and scala-xml.jar could all be merged to form a customised scala-library.jar for the benefit of existing users.

Note that throughout this, I've not said what the format of these modules should be. I don't think it's Scala's duty to necessarily pick the winner in the module race; the SimpleModuleSystem subset will be enough to be compatible with whatever module system ends up becoming the de facto standard for Java. What is essential is that these modules are versioned – so one can express dependencies on a particular version of the library (or version range), and that the modules' dependencies form a directed acyclic graph or DAG. In other words, it's OK for scala-actors.jar to depend on scala-concurrent.jar, but not the other way around at the same time. Ultimately, it may be possible to have a Scala concept of Module to obtain this programmatically, in much the same way that Scala 2.8 provides a package object. However, this shouldn't be necessary in order to split up the modules initially.

We can already express dependencies in JAR files in a simple way; the Class-Path manifest header allows you to define dependent classes to a JAR. So, the scala-actors.jar could define Class-Path: scala-core.jar scala-concurrent.jar, for example. The key is to ensure we don't end up with loops, where scala-concurrent.jar ends up with a reference to scala-actors.jar, directly or indirectly. We could even ship a largely empty scala-library.jar containing Class-Path: scala-actors.jar scala-concurrent.jar ... which would then enable (run-time) equivalence from before. (Whether IDEs are smart enough to recognise that they'll need to add those dependencies based on the Class-Path entry is debatable.)

If Maven is being used to build Scala, then even better; these just become separate top-level Maven modules. Even the SBax build process lists different modules, although the scala-library is a bit monolithic. But however Scala is built, the ability to run Scala from the command line (i.e. outside of a runtime module system) is a key component which should be easy to achieve; after all, the current scala executable already merges in all libraries in the lib/ directory as part of the boot process.

Martin has said that there is interest, but that the core EFPL team has not got the resources in order to make it happen. Therefore, this is a call for those who would be interested in making Scala more modular to join in. As Martin said:

I realize there are pros and cons to modularize Scala's standard library. I am all for exploring different strategies. But I also have to accept my own limitations, both in available time and expertise in this domain. If a group of people wants to take the lead in this and make a concrete proposal, please do! It's an ideal topic for a SIP, really.

So, those that are interested in helping out with this effort (Peter Kriens has already expressed an interest), please get in contact with me so that we can put a proposal together.

Tuesday, August 25, 2009

Is Scala ready for the enterprise?

References

Short answer, no. But before you jump to the bottom of the comments section, citing sites or organisations that use it, it's necessary to understand what enterprise means in this context. It's not “nobody uses it” – the Lift web framework is used by a number of high-profile companies, and Twitter uses Scala for some of its back end services. But Twitter isn't really an enterprise user; it's more of a popular start-up. Enterprise organisations are those with tens of thousands of employees, with an average of 1.5 computers for every employee. In short, these are organisations which have many internal applications, often developed by hundreds of developers over a multi-year period, and which then stay around in the organisation for many years to come. For this reason, amongst others, rates amongst COBOL developers can exceed Java developers; after all, Universities are spewing out Java developers with “Hello World” experience by the truckload, but COBOL developers are growing fewer each day ...

But I digress; this isn't about COBOL or Java, but Scala, and whether it's suitable for enterprise organisations. The answer is still a firm no, and unfortunately, it doesn't look like it's going to change any time soon, either. But before we go into the whys and wherefores, it's important to understand that the key thing about both COBOL and Java is that they are stable. They don't change that much, and if they do, it's usually a gradual evolution rather than a big bang change.

Java, of course, hasn't changed much since the late 90s in terms of the underlying language. Yes, we had the hassle of Generics (Done Badly), but that was more of a source-level change; unfortunately, it was done badly in a way that necessitated bytecode level changes as well. That's a real shame, because whilst the bytecode format actually has a number of extensible avenues, the generics implementation didn't really support it well; which meant 1.5 apps didn't run on 1.4 VMs. This has, until recently, still been an issue; in fact, as recently as last year, applications were being moved off of 1.3 VMs. In fact, only recently there has been discussions about back-porting the Eclipse XSLT support for Java systems as old as 1.4 – although since 1.4 is now EOL, we may finally see the end of that particular headache in the Java world. (Java 1.5 EOL is only a couple of months away ...)

If you are still having difficulty in understanding why everyone is not on the latest and greatest 1.6u17p1z3x5 JVM from Sun, then you do not work at an enterprise company. It's as simple as that. Anyway, back to Scala ... With Scala 2.8 around the corner, you'd think that it would have reached a level of stability suitable for use in large-scale organisations, or at least, that it wouldn't have been significantly different from the previous version, 2.7.

As an aside OSGi goes to great lengths to define the purpose of version numbers, so that you know what you're getting:

  • major – incompatible changes
  • minor – new features but which are backwardly compatible if you don't use them
  • micro – bug fixes but with equivalent APIs

By this system, you generally know that a 1.0 system is compatible with a 1.1 system, but that it probably isn't with a 2.0 system. Java, the mother of all screwed up version numbering systems, has pretty much defined the version number as 1.major.minor.bugfix. This is mostly useless, and even though the versions of Java have evolved over time, arguably the difference between Java 1.4 and Java 1.5 should have triggered a change to Java 2.0 (except the Sun marketeers kind of shot that idea in the head with the Java2 days of 1.2, which is why they're always stuck with a lonely 1 at the front). Sadly, Oracle is the bastard love-child of the version number madness; the current JDBC driver for Oracle 10g is 11.1.0.7.0 ... so Java is only likely to get worse, rather than better.

Backward compatibility

Scala has had a number of changes over the years. Actors, Java-Generics support, and most recently continuations have all come into the language. And each time, they've resulted in a backwards compatibility break. From the 2.8 preview notes:

The 2.8 version will also drop most operations that were deprecated in earlier versions, and will no longer support the old version 1.4 of the Java Virtual Machine. The class file format and compilation internals will be changed to some degree, so that programs compiled under 2.8 will not be compatible with binaries compiled earlier. Source compatibility should be by and large assured, however.

This isn't the first time it's happened, either. Back in 2.7, there were minor language changes, including the support for Java Generics, that meant binary compatibility with previous versions was lost. And previously, actor-based code compiled with 2.5 had issues running on 2.6. So each of these changes, whilst broadly similar from a source perspective, can have different runtime requirements that mean recompilation is necessary with each new Scala version.

Arguably, one of the reasons for Java's success is the ability to take any binary, even written as far back as Java 1.0, and be able to run that on a new version of the JVM. Indeed, it's tantamount to the quality of library produces like XML parsers and other such generic tools that they invest such backward compatible changes to ensure that they don't rely on any new features (such as eschewing java.lang.String#split, which is only available on Java 1.5 onwards). This is one of the main reasons why Java libraries – and by extension, Java systems that use them – are available and so widely used.

Now, consider Scala. Given that each binary is basically only compatible with the version that compiled it, where do we see Scala libraries going? Well, Scala has already missed out on the concept of modularity; like Java, they have their own rt.jar (except it's called scala-library.jar) and basically that is the One True System that can never be changed. We've even gone the wrong way; whereas before the Actors library (arguably one of the success stories of the Scala ecosystem) was distributed in its own module (scala-actors.jar) but that got folded into the main Über library back in version 2.5.

IDE support

These days, a language without corresponding IDE support is unlikely to hit the big time, let alone enterprise users. Tools like Eclipse, NetBeans and IntelliJ have spoiled simple text editors for most, not the least of which is the wealth of context-sensitive tools (refactoring, autocompletion etc.) that these bring. Yes, it's possible to use TextEdit, EditPlus, SubEthaEdit, <insert-favourite-here> and have some kind of keyword formatting and CamelCaseColouring, but for most developers this isn't enough. Furthermore, the argument goes, that a strongly typed language should be easier to drive an IDE in this space than an untyped one; so refactoring tools for Python (say) aren't as advanced as those for Java.

However, investing in an IDE takes a lot of time and effort. Even with platforms like Eclipse and NetBeans to act as jumping off points, there's still a lot of work involved in making an IDE happen. Furthermore, since most langauges don't export their own parsers or ADTs, the IDE tool support often ends up writing their own parser. This is great for languages like Java, which change at a glacial pace, because it means that one tool fits all. However, when that language syntax changes (properties in Objective-C 2.0, generics in Java, etc.) then the IDE has to re-work the support for parsing the system in the first place.

The obvious step would be to take a leaf out of Smalltalk's book, and have both the parser, ADT and IDE available to take account of the same language support in one all-inclusive system. Unfortunately, in Scala's case, the syntax changes from one version to another and so the Scala IDE support has hard-coded/baked in support for a particular language, often by delegating to Scala's parser directly. So when you download a cut of the Scala IDE, you're tightly bound to a particular version of the language.

Java doesn't suffer this problem as much because the language spec is orthogonal to the VM and libraries that support it; so any tool can write a Java parser – or compiler, for that matter – and then the choice of Java version is left up to the libraries on the classpath. Scala, on the other hand, changes syntax frequently and ends up emitting different (and potentially incompatible) bytecode with every release; so if you want to compile with an IDE you need to synchronise not only the libraries you wish to use, but also the IDE support that you want to get.

That said, Scala's IDE support has moved on massively from where it was, but sadly it's still possible to be more productive in a more verbose language (like Java) than a more terse language (like Scala) simply because the IDE takes much of the verboseness away from you. Programmers are paid by the hour, not by the number of lines of code that they (don't) generate; even assuming a 2-1 factor of size difference, the Scala source (at 1.4Mb) would result in an additional cost of 1.4Mb if in Java in storage space. Even with Apple's 128Gb SSD at $300, you're looking at $1 buying about 400Mb of space. So that extra storage space costs maybe a couple of cents at best.

Modularity

So, what's a budding Scala developer to do? Well, you could take another bad leaf out of the Java cookbook and create The Other Über Library, scalax, on the grounds that the name inspired a striking similarity with Java's blunder of the same name. For those who haven't been working with Java since it was first released back in 1995, the whole javax thing started out as a way to have standard-but-not-quite-standard Java classes be downloadable remotely (Java's VM prevents, by rule, the loading of java.* classes from outside of the rt.jar).

What this really boils down to is a stunning lack of decent modularity at the time; and in fact, what Java has always needed (but never had delivered) is a micro kernel of a VM, and just enough libraries (Object, String, HashSomething) to get out of bed in the morning and stretch its electronic legs. Everything else should be an external library. We'd then be able to have decent updates to the VM (say, to introduce concurrent garbage collection) whilst still using an older version of libraries, or even not at all. The whole Java Kernel project isn't solving this problem – it just changes the order of the bits downloaded – and it's probably too late for Java to achieve true modularity in this day and age in any case (especially with the division over version numbers as big as ever).

Update One of the commentators claimed that this isn't a big deal, because the scala-library.jar is only 3.5Mb in 2.7 and 4.0Mb in 2.8. Well, guess what – Java started off small as well. The archived downloads on Sun's Java site lists the Java 1.1 installer at a tiny 2.5Mb; and that includes the VM as well. Small things have a habit of growing into big things when you don't pay attention; and, like the Java libraries, you start getting bleed-through from modules that really shouldn't (such as the BeansContext depending on AppletInitializer, which means that the Beans API actually depends on the Applet and thus AWT packages).

That's not to say that Scala needs to go OSGi necessarily – the SimpleModuleSystem may be enough – but it's important to start delineating the lines between modules sooner rather than later.

There's also no reason why the library couldn't be split apart such that the Actors library lived in its own Jar, like before – after all, the whole scala process is brought up by a batch/shell script, which already adds all lib/* entries to the classpath.

Finally, this isn't on its own a reason to not use Scala in the enterprise. It is mentioned here because things that start small tend to grow uncontrollably big, and by the time they are big, they tend to be more difficult to unpack later rather than earlier. But it's also looking ahead – some form of modularity is essential in the JVM apps of the future, and Scala appears to be facing in the wrong direction at the moment.

Language features

Scala is a powerful language, there's no doubt about that. But with great power comes great responsibility; and Scala can be made to do twisted things.

  • Symbolic names As an example, it's perfectly possible to have symbolic method names, which is why /: and :\ are not just quizzical emoticons, but are also defined functions in Scala (for foldLeft and foldRight respectively). We're only a hop, skip, and longjmp away from Haskell Arrows. There's no point in an International Obfuscated Scala Contest – most of the source is already pre-obfuscated by the syntax.

    And you can define your own if you want to. Chances are that this is both powerful and also dangerous, and only time will tell if projects survive. (One advantage that this does give is a standard Ordered trait which can be reused by multiple classes.)

  • Impicits If that wasn't enough to put people off, there is a bit of pixie magic called implicits. An implicit allows you to define a function which is magically applied wherever it is needed. For example, if you have an object of type Foo, and your function expects an object of type Bar, then if you have a function which does a Foo to Bar conversion, it gets injected in by the compiler. In simplistic cases, this is easily to understand; but as a system grows, such magic insertions can become hidden as well as introducing subtle bugs.

  • Documentation, lack thereof In order for any language, tool or system to become successful, there needs to be a well thought out and well structured set of documents that explain how it fits together. Alas, Scala is produced as an academic exercise first, which means that all of the documentation is available in PDF, which the standard Scala install bundles up as part of the download. PDFs are good for some kinds of reference documentation – even in addition to – but should never be a replacement for HTML based documentation. It's not as if markup-based languages like LaTeX and DocBook can't be used to generate both PDF and HTML documentation (and the Scala documentation is all in LaTeX format).

    Compare that to competing languages like Java, Python and Ruby. Oh well, at least it's better than Groovy.

    But there are some real gotchas in the way that the language works. For example, methods ending with a colon (like /:) are right-associative, whereas those which don't (like :\) are left-associative. There's no big highlighted section on the website that explains this small but crucial fact; it's buried in the middle of a PDF document.

    Granted, the documentation (in the form of books) are coming on-line now, but given that www.scala-lang.org is such a dearth of useful information (which, thanks to its redesign just has “memorable” URLs of the form /node/123), mixing news, blog items, and tutorials in one incomprehensible website.

  • Lack of skilled developers As with any new technology, experience spreads from those who are the early adopters and then grows out. The early adopters are generally driven to find out more and learn about a technology, but in a large organisation, particularly with organisations that have outsourced development, there isn't the drive (or the time to invest) in learning new technologies. Furthermore, cost drives down the skills cost base to the minimum, which is why the above Universities churn out multitudinous Java developers.

  • Maintenance Finally, the maintenance aspect. Any enterprise system is likely to be around for a number of years; in some cases, exceeding its planned lifespan by a factor of two or three. It's essential that there is sufficient skills and talent in an organisation to be able to support horizontal transfers between departments; and that means that a suitably skilled developer needs to be able to pick up the work of another and continue on with the work.

    The big learning curve is normally the system under maintenance; but when you have a new language like Scala, often with new paradigms (many developers don't appreciate the subtleties of functional programming, for example), the learning curve of understanding the system can be outweighed by the learning curve of the language it's written in.

Summary

Now, all is not totally lost in the Scala world; you can build a passable OSGi-based Scala application with the Scala Modules port (which fixes up some incorrect OSGi data, hopefully to be fixed as part of Scala 2.8) and the language is introducing people to ideas that they wouldn't have been exposed to had they stayed in their nice, warm, but slightly decaying Java-land. It also encourages innovation; the Lift framework and Actors are just two of the kinds of examples that it can provide.

But ultimately, building a Scala application is like building a sandcastle; it looks good at first, then starts to crumble, and then when the next tide comes in you pick up the pieces and rebuild again. That might work for Web 2.0 startups, but it doesn't cut the mustard at enterprise organisations.

Friday, August 07, 2009

Mail.app message URLs and iCal

References

Mail.app responds to a message: URL protocol; if you specify a Message-Id (as seen by the message's headers; View->Message>Long Headers), then it will open up the message in the Mail application. For example, here's some headers from an (old) message:

X-Apple-Mail-Remote-Attachments: YES
X-Apple-Mail-Signature: SKIP_SIGNATURE
Content-Type: text/html;charset=US-ASCII
Message-Id: <2929BE19-D00D-BEEF-ABDC-AB0CF776910B@gmail.com>
Content-Transfer-Encoding: 7bit
X-Apple-Windows-Friendly: 1
From: Alex Blewitt <...@gmail.com>
Subject: Test

For the purposes of this discussion, the Message-Id header (<2929BE19-D00D-BEEF-ABDC-AB0CF776910B@gmail.com>) is the important part. To open this message up in Mail.app, regardless of which folder this mail message happens to be in, we can invoke an open command from Terminal (or any other app capable of launching a document based on a URL) that will invoke it, with message:<2929BE19-D00D-BEEF-ABDC-AB0CF776910B@gmail.com>. Mail will then display this message, which can be handy for replying to specific messages, or attaching to reminders. (If you run this from Terminal, due to Message IDs often including < and > characters, then you might need to wrap it in 'quotes' to avoid shell interpretation.)

This can also be hooked into iCal, although not in an obvious way. If you edit an entry (or create one), you'll see a url field. This is typically used to refer to web-based resources associated with an event. You can also use it to fire up a mail message, either by constructing the message ID yourself, or by dragging-and-dropping the message onto the URL field. For whatever reason, you can't drag-n-drop it onto the event directly; you need to open up the event info window first.

Wednesday, August 05, 2009

WebGL JavaScript binding for OpenGL ES

References

This should be interesting. OpenGL (ES), the open 3D graphics rendering toolkit currently used by all iPhones and Mac platforms, is to get a binding specifically for JavaScript, WebGL. At the moment it's only a work in progress, but it seems that this will unlock the power of the Canvas element of HTML5 to be able to drive OpenGL applications and take advantage of straight-to-hardware rendering.

Of course, JavaScript isn't necessarily the best language to be doing this in — not the lest of which is the raw computational speed (or lack thereof) but this will give Web-based developers a way of exercising OpenGL content without having to drop into an Active-X or similar native plugin. One can only hope that Apple will put their weight behind this as well, although with the QuickTime architecture already present in the Mac it may be the case that it's a slam dunk already.

It should be noted that the binding will be based on OpenGL ES 2.0, which is the graphics library behind the iPhone 3Gs. It's also worth observing that this is the cut-down specification (ES) for mobile devices, which misses out on library toolkits like GLUT, but has a much more streamlined/simplified API than the wider OpenGL specification.

It will also provide a lower barrier to entry for developers interested in playing around with OpenGL. At the moment, most* of the languages that are used with OpenGL are compiled, with the compile-test-execute cycle that can slow down investigative or experimental work. With a dynamic langauge and built-in support for OpenGL into the browser, it will be much easier to experiment with the APIs for developing 3D content. It should also be observed that this will likely transfer into skills for OpenGL games on other platforms (including the iPhone). Lastly, it may be possible to define WebGL games that will run on the iPhone natively, but delivered via the web as JavaScript and get everyone out of the black hole that is the Apple CrAppStore.

Knowing the working group, it will take a while to refine, but once it goes live, I expect that both FireFox and Safari will end up supporting the standard and that Google will sponsor an ActiveX plugin that provides the necessary for IE, on the grounds that Microsoft is likely to fight against it in their war against standards.

Lastly, this is likely to be a blow for Flash, since one of the main purposes of Flash is to provide crappy graphics on websites. Flash won't die any time soon – the development tools for Flash-based apps are probably going to carry it forward for some time to come – but WebGL might allow those applications to step up a notch for rendering games.

* Yes, there's a couple of bindings for Python and a couple for SWT/Java which is experimental through Groovy/Scala's interpreter ... but you have to go out of your way to install them over and above the web browser that you currently have installed and are reading this with

Saturday, August 01, 2009

Google Voice? There's an app rejection for that. Only on iPhone.

References

It shouldn't really come to anyone as a surprise that the iPhone AppStore process is fundamentally broken; after all, Apple are acting as a (biased) floodgate to what goes in, and what goes out, of the iPhone AppStore.

On one hand, you have rabid Mac fanatics arguing that it's Apple's ball, and they can take it home any time they want to. Others point out that it's a commercial organisation, and they can choose their terms and you can like it or lump it.

But you also have the anti-competitive angle. iTunes explicitly locks out non-Apple hardware for no other reason than enforcing anti-competitive behaviour, and apps that Apple doesn't like (or perhaps AT&T) get booted without any recourse of action. That's not the goal of a non-monopolistic company; even Microsoft lets any number of applications run on its Windows Mobile devices; Google's Android is similarly unencumbered.

For some reason, the fact that the iPhone is considered a phone, rather than a portable computer, means that Apple can get away with claiming doom and gloom and the end of civilization as we know it (PDF) as well as trying to restrict free speech. However, neither of these point out the obvious that the iPod Touch is a similar portable computer with no such baseband capability; so none of the arguments about the cellphone specific damage are relevant for iPod touches.

The matter has come to the attention of the FCC, who have sent letters to Apple, Google and AT&T to ask them about the rationale behind the rejection. It's possible that something will come from this, but it's more likely that there will be some nefarious argument like 'didn't meet AppStore guidelines' (or one of the many AppStore rejections). However, if the FCC believe that AT&T had specific communication with Apple regarding the application, they may not buy such crap.

The build-up of arguments seem to favour the creation of additional AppStores. There's no reason why necessarily Apple needs to be the sole provider of applications; and it's right that they choose what to host in their store. However, by also being the hardware provider (and the enforced lockdown of installation of other software) is the key objection. If other AppStores could thrive, then applications not blessed by the AppStore may be available elsewhere.

This has already happened. Cydia, the de-facto standard installer for jailbroken phones, has its own Cydia AppStore giving open source and commercial developers alike the ability to sell their wares, albeit to a smaller audience. You can get Google Voice for iPhone via the Cydia AppStore, and it's just one reason why the (Apple) AppStore in its current form is broken. Even recent changes (such as providing keywords for applications) won't help fix the underlying problem, which is that the iPhone AppStore has a horribly broken model for finding what you want to buy, and that almost all sales come from those staff picks and top 10 lists means that the AppStore is limited to maybe 20, 30 different apps. And when a significant proportion of the catalogue is fart applications, I think the concept of Apple defending its place as a quality control and to protect the quality of the platform is demonstrably failed.