Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Is Scala ready for the enterprise?

Rant Java 2009 Scala

Short answer, no. But before you jump to the bottom of the comments section, citing sites or organisations that use it, it's necessary to understand what enterprise means in this context. It's not “nobody uses it” – the Lift web framework is used by a number of high-profile companies, and Twitter uses Scala for some of its back end services. But Twitter isn't really an enterprise user; it's more of a popular start-up. Enterprise organisations are those with tens of thousands of employees, with an average of 1.5 computers for every employee. In short, these are organisations which have many internal applications, often developed by hundreds of developers over a multi-year period, and which then stay around in the organisation for many years to come. For this reason, amongst others, rates amongst COBOL developers can exceed Java developers; after all, Universities are spewing out Java developers with “Hello World” experience by the truckload, but COBOL developers are growing fewer each day ...

But I digress; this isn't about COBOL or Java, but Scala, and whether it's suitable for enterprise organisations. The answer is still a firm no, and unfortunately, it doesn't look like it's going to change any time soon, either. But before we go into the whys and wherefores, it's important to understand that the key thing about both COBOL and Java is that they are stable. They don't change that much, and if they do, it's usually a gradual evolution rather than a big bang change.

Java, of course, hasn't changed much since the late 90s in terms of the underlying language. Yes, we had the hassle of Generics (Done Badly), but that was more of a source-level change; unfortunately, it was done badly in a way that necessitated bytecode level changes as well. That's a real shame, because whilst the bytecode format actually has a number of extensible avenues, the generics implementation didn't really support it well; which meant 1.5 apps didn't run on 1.4 VMs. This has, until recently, still been an issue; in fact, as recently as last year, applications were being moved off of 1.3 VMs. In fact, only recently there has been discussions about back-porting the Eclipse XSLT support for Java systems as old as 1.4 – although since 1.4 is now EOL, we may finally see the end of that particular headache in the Java world. (Java 1.5 EOL is only a couple of months away ...)

If you are still having difficulty in understanding why everyone is not on the latest and greatest 1.6u17p1z3x5 JVM from Sun, then you do not work at an enterprise company. It's as simple as that. Anyway, back to Scala ... With Scala 2.8 around the corner, you'd think that it would have reached a level of stability suitable for use in large-scale organisations, or at least, that it wouldn't have been significantly different from the previous version, 2.7.

As an aside OSGi goes to great lengths to define the purpose of version numbers, so that you know what you're getting:

  • major – incompatible changes
  • minor – new features but which are backwardly compatible if you don't use them
  • micro – bug fixes but with equivalent APIs

By this system, you generally know that a 1.0 system is compatible with a 1.1 system, but that it probably isn't with a 2.0 system. Java, the mother of all screwed up version numbering systems, has pretty much defined the version number as 1.major.minor.bugfix. This is mostly useless, and even though the versions of Java have evolved over time, arguably the difference between Java 1.4 and Java 1.5 should have triggered a change to Java 2.0 (except the Sun marketeers kind of shot that idea in the head with the Java2 days of 1.2, which is why they're always stuck with a lonely 1 at the front). Sadly, Oracle is the bastard love-child of the version number madness; the current JDBC driver for Oracle 10g is 11.1.0.7.0 ... so Java is only likely to get worse, rather than better.

Backward compatibility

Scala has had a number of changes over the years. Actors, Java-Generics support, and most recently continuations have all come into the language. And each time, they've resulted in a backwards compatibility break. From the 2.8 preview notes:

The 2.8 version will also drop most operations that were deprecated in earlier versions, and will no longer support the old version 1.4 of the Java Virtual Machine. The class file format and compilation internals will be changed to some degree, so that programs compiled under 2.8 will not be compatible with binaries compiled earlier. Source compatibility should be by and large assured, however.

This isn't the first time it's happened, either. Back in 2.7, there were minor language changes, including the support for Java Generics, that meant binary compatibility with previous versions was lost. And previously, actor-based code compiled with 2.5 had issues running on 2.6. So each of these changes, whilst broadly similar from a source perspective, can have different runtime requirements that mean recompilation is necessary with each new Scala version.

Arguably, one of the reasons for Java's success is the ability to take any binary, even written as far back as Java 1.0, and be able to run that on a new version of the JVM. Indeed, it's tantamount to the quality of library produces like XML parsers and other such generic tools that they invest such backward compatible changes to ensure that they don't rely on any new features (such as eschewing java.lang.String#split, which is only available on Java 1.5 onwards). This is one of the main reasons why Java libraries – and by extension, Java systems that use them – are available and so widely used.

Now, consider Scala. Given that each binary is basically only compatible with the version that compiled it, where do we see Scala libraries going? Well, Scala has already missed out on the concept of modularity; like Java, they have their own rt.jar (except it's called scala-library.jar) and basically that is the One True System that can never be changed. We've even gone the wrong way; whereas before the Actors library (arguably one of the success stories of the Scala ecosystem) was distributed in its own module (scala-actors.jar) but that got folded into the main Über library back in version 2.5.

IDE support

These days, a language without corresponding IDE support is unlikely to hit the big time, let alone enterprise users. Tools like Eclipse, NetBeans and IntelliJ have spoiled simple text editors for most, not the least of which is the wealth of context-sensitive tools (refactoring, autocompletion etc.) that these bring. Yes, it's possible to use TextEdit, EditPlus, SubEthaEdit, <insert-favourite-here> and have some kind of keyword formatting and CamelCaseColouring, but for most developers this isn't enough. Furthermore, the argument goes, that a strongly typed language should be easier to drive an IDE in this space than an untyped one; so refactoring tools for Python (say) aren't as advanced as those for Java.

However, investing in an IDE takes a lot of time and effort. Even with platforms like Eclipse and NetBeans to act as jumping off points, there's still a lot of work involved in making an IDE happen. Furthermore, since most langauges don't export their own parsers or ADTs, the IDE tool support often ends up writing their own parser. This is great for languages like Java, which change at a glacial pace, because it means that one tool fits all. However, when that language syntax changes (properties in Objective-C 2.0, generics in Java, etc.) then the IDE has to re-work the support for parsing the system in the first place.

The obvious step would be to take a leaf out of Smalltalk's book, and have both the parser, ADT and IDE available to take account of the same language support in one all-inclusive system. Unfortunately, in Scala's case, the syntax changes from one version to another and so the Scala IDE support has hard-coded/baked in support for a particular language, often by delegating to Scala's parser directly. So when you download a cut of the Scala IDE, you're tightly bound to a particular version of the language.

Java doesn't suffer this problem as much because the language spec is orthogonal to the VM and libraries that support it; so any tool can write a Java parser – or compiler, for that matter – and then the choice of Java version is left up to the libraries on the classpath. Scala, on the other hand, changes syntax frequently and ends up emitting different (and potentially incompatible) bytecode with every release; so if you want to compile with an IDE you need to synchronise not only the libraries you wish to use, but also the IDE support that you want to get.

That said, Scala's IDE support has moved on massively from where it was, but sadly it's still possible to be more productive in a more verbose language (like Java) than a more terse language (like Scala) simply because the IDE takes much of the verboseness away from you. Programmers are paid by the hour, not by the number of lines of code that they (don't) generate; even assuming a 2-1 factor of size difference, the Scala source (at 1.4Mb) would result in an additional cost of 1.4Mb if in Java in storage space. Even with Apple's 128Gb SSD at $300, you're looking at $1 buying about 400Mb of space. So that extra storage space costs maybe a couple of cents at best.

Modularity

So, what's a budding Scala developer to do? Well, you could take another bad leaf out of the Java cookbook and create The Other Über Library, scalax, on the grounds that the name inspired a striking similarity with Java's blunder of the same name. For those who haven't been working with Java since it was first released back in 1995, the whole javax thing started out as a way to have standard-but-not-quite-standard Java classes be downloadable remotely (Java's VM prevents, by rule, the loading of java.* classes from outside of the rt.jar).

What this really boils down to is a stunning lack of decent modularity at the time; and in fact, what Java has always needed (but never had delivered) is a micro kernel of a VM, and just enough libraries (Object, String, HashSomething) to get out of bed in the morning and stretch its electronic legs. Everything else should be an external library. We'd then be able to have decent updates to the VM (say, to introduce concurrent garbage collection) whilst still using an older version of libraries, or even not at all. The whole Java Kernel project isn't solving this problem – it just changes the order of the bits downloaded – and it's probably too late for Java to achieve true modularity in this day and age in any case (especially with the division over version numbers as big as ever).

Update One of the commentators claimed that this isn't a big deal, because the scala-library.jar is only 3.5Mb in 2.7 and 4.0Mb in 2.8. Well, guess what – Java started off small as well. The archived downloads on Sun's Java site lists the Java 1.1 installer at a tiny 2.5Mb; and that includes the VM as well. Small things have a habit of growing into big things when you don't pay attention; and, like the Java libraries, you start getting bleed-through from modules that really shouldn't (such as the BeansContext depending on AppletInitializer, which means that the Beans API actually depends on the Applet and thus AWT packages).

That's not to say that Scala needs to go OSGi necessarily – the SimpleModuleSystem may be enough – but it's important to start delineating the lines between modules sooner rather than later.

There's also no reason why the library couldn't be split apart such that the Actors library lived in its own Jar, like before – after all, the whole scala process is brought up by a batch/shell script, which already adds all lib/* entries to the classpath.

Finally, this isn't on its own a reason to not use Scala in the enterprise. It is mentioned here because things that start small tend to grow uncontrollably big, and by the time they are big, they tend to be more difficult to unpack later rather than earlier. But it's also looking ahead – some form of modularity is essential in the JVM apps of the future, and Scala appears to be facing in the wrong direction at the moment.

Language features

Scala is a powerful language, there's no doubt about that. But with great power comes great responsibility; and Scala can be made to do twisted things.

  • Symbolic names As an example, it's perfectly possible to have symbolic method names, which is why /: and :\ are not just quizzical emoticons, but are also defined functions in Scala (for foldLeft and foldRight respectively). We're only a hop, skip, and longjmp away from Haskell Arrows. There's no point in an International Obfuscated Scala Contest – most of the source is already pre-obfuscated by the syntax.

    And you can define your own if you want to. Chances are that this is both powerful and also dangerous, and only time will tell if projects survive. (One advantage that this does give is a standard Ordered trait which can be reused by multiple classes.)

  • Impicits If that wasn't enough to put people off, there is a bit of pixie magic called implicits. An implicit allows you to define a function which is magically applied wherever it is needed. For example, if you have an object of type Foo, and your function expects an object of type Bar, then if you have a function which does a Foo to Bar conversion, it gets injected in by the compiler. In simplistic cases, this is easily to understand; but as a system grows, such magic insertions can become hidden as well as introducing subtle bugs.

  • Documentation, lack thereof In order for any language, tool or system to become successful, there needs to be a well thought out and well structured set of documents that explain how it fits together. Alas, Scala is produced as an academic exercise first, which means that all of the documentation is available in PDF, which the standard Scala install bundles up as part of the download. PDFs are good for some kinds of reference documentation – even in addition to – but should never be a replacement for HTML based documentation. It's not as if markup-based languages like LaTeX and DocBook can't be used to generate both PDF and HTML documentation (and the Scala documentation is all in LaTeX format).

    Compare that to competing languages like Java, Python and Ruby. Oh well, at least it's better than Groovy.

    But there are some real gotchas in the way that the language works. For example, methods ending with a colon (like /:) are right-associative, whereas those which don't (like :\) are left-associative. There's no big highlighted section on the website that explains this small but crucial fact; it's buried in the middle of a PDF document.

    Granted, the documentation (in the form of books) are coming on-line now, but given that www.scala-lang.org is such a dearth of useful information (which, thanks to its redesign just has “memorable” URLs of the form /node/123), mixing news, blog items, and tutorials in one incomprehensible website.

  • Lack of skilled developers As with any new technology, experience spreads from those who are the early adopters and then grows out. The early adopters are generally driven to find out more and learn about a technology, but in a large organisation, particularly with organisations that have outsourced development, there isn't the drive (or the time to invest) in learning new technologies. Furthermore, cost drives down the skills cost base to the minimum, which is why the above Universities churn out multitudinous Java developers.

  • Maintenance Finally, the maintenance aspect. Any enterprise system is likely to be around for a number of years; in some cases, exceeding its planned lifespan by a factor of two or three. It's essential that there is sufficient skills and talent in an organisation to be able to support horizontal transfers between departments; and that means that a suitably skilled developer needs to be able to pick up the work of another and continue on with the work.

    The big learning curve is normally the system under maintenance; but when you have a new language like Scala, often with new paradigms (many developers don't appreciate the subtleties of functional programming, for example), the learning curve of understanding the system can be outweighed by the learning curve of the language it's written in.

Summary

Now, all is not totally lost in the Scala world; you can build a passable OSGi-based Scala application with the Scala Modules port (which fixes up some incorrect OSGi data, hopefully to be fixed as part of Scala 2.8) and the language is introducing people to ideas that they wouldn't have been exposed to had they stayed in their nice, warm, but slightly decaying Java-land. It also encourages innovation; the Lift framework and Actors are just two of the kinds of examples that it can provide.

But ultimately, building a Scala application is like building a sandcastle; it looks good at first, then starts to crumble, and then when the next tide comes in you pick up the pieces and rebuild again. That might work for Web 2.0 startups, but it doesn't cut the mustard at enterprise organisations.