I've been musing for some time about the state of computer languages, and their general inability to deal with the multi-core revolution. Although we haven't hit a firm upper limit on clock speed, the advances in clock speed over the next five to ten years aren't going to be as impressive, relatively speaking, as the increases over the previous five to ten years. Instead, we're facing an impending multi-core revolution.
What does multi-core mean for general computing? Well, probably not much at this stage; the majority of applications are still single-threaded (or mostly so). There's minimal multi-threading in some GUI applications, if only to keep the 'working thread' off the 'event thread' queue, but the granularity of jobs is usually quite high even then. When the user clicks a button, typically a new thread is kicked off (or a job is passed to a thread pool) for execution. Depending on your GUI, it may then be necessary for any results to be posted back to the UI thread in order to render the view.
Whilst general algorithms and data structures have been understood for a while, ones that work in a multi-threaded environment are less so. The good old Immutable pattern is often brought up here; if you've got a read-only data-structure (like Java's
Strings) then you can happily pass it from one place to another without needing to worry about what happens to them. Mutable data structures, like
HashMap, aren't so easy to deal with, as multiple threads performing one-or-more change operations and zero-or-more read operations can expose inconsistencies in the way that the data structure is being used.
Note: this isn't an issue with the issue of interleaving threads. If you have two threads inserting and accessing a value, then depending on the order of those thread operations the accessing thread will either see a value or not. The key is that the operations must be isolated from one another, not that the ordering is guaranteed.
A more pressing issue is how problems are split in order to take advantage of parallelism, and how well the language solves these problems at a primitive level. For example, the bubble sort algorithm is not something that lends itself well to parallelism, whereas the quick sort lends itself somewhat better.
Generally, divide-and-conquer algorithms are likely to be better suited for parallelism than others. Unfortunately, for parallel algorithms (i.e. where the branching is > 1) you still have the problem of how to weave results together from your children in order to propagate them back to the caller. Even with tail recursive optimisations, you're still going to leave stuff on the stack. In a parallel quicksort, you need to
sortRight and then append; whilst the children are being sorted (possibly in parallel) you're still going to end up with a situation where you're waiting for the results to come in. Ideally you'd want to return at that point saying 'Here's what it will look like, but you have to evaluate the function to find out'. This approach is akin to returning a continuation, in which case it becomes the caller's problem.
The other possible way of achieving parallelism is through vectorisation. If you have an operation which needs to be applied to a set of values, then (as long as side-effects aren't an issue) you should be able to apply that operation across the elements in any order or in parallel with each other.
Unfortunately, neither of these are necessarily easy to convert from existing programs. Firstly, most programs do their set processing in a
for-style loop (which is not easy to guarantee that you need to operate over all elements, since some may be skipped or some may be aborted early), or the operation itself has some kind of side-effect which requires operation in-order.
Some languages provide built-in threading and synchronization concepts; others have had it bolted on after the fact. Others are still limited by a single-threaded interpreter. However, the only language that comes close to parallelism is Erlang, which operates as a set of inter-connected lightweight actors (or processes, in Erlang terms) that can send and receive messages asynchronously. In this regard, Erlang is like a miniature in-process MoM where the action of consuming a message results in some processing and then one or more messages being generated as a result. These types of systems can be easily scaled (resource permitting) since they separate the responsibility of the thing-that-executes from the thing-that-is-being-executed. There are some semantic issues (such as whether messages are consumed in order, or whether the message queues are persistent in the case of a system outage) that differ between the two, but in principle the concept is the same.
Erlang also has a few other aspects that MoM doesn't; when a process results in a failure or exception, the process dies, and if in a process group, can take down sibling processes as well. This facilitates replacement of processes with newer versions of code on the fly, as well as providing some elegant error recovery, but are not necessarily part of the requirements for multi-threaded processing.
Error recovery can be handled in other ways in different systems; for example, MoM typically routes bad messages to the input queue for a single retry, or to the dead letter queue if there's a more pervasive failure.
There's relatively few languages which have adopted functional-style operators such as
apply. These can be used to great effect over a set of data in parallel; this is exactly what Google's
mapreduce does over a grid of computing resources. Programers typically don't (or can't) think of problems in this way, so instead of saying 'Give me all the even numbers in this list' will say 'For each element, see if it is even, and if so, add it to a temporary list which I'll then return'. Even in psuedo-code, it's still longer to type the imperative version.
What's particularly odd is that we've already got mainstream systems that use this type of code already, and a number of developers are already familiar with them. XSLT is an example of a functional programming language; in fact, it's designed to be a filter/map combination that transforms one tree into another one. There are people who are adept at writing XSL transformers, although it's a skill that not all developers share. However, even more common than that is SQL; you don't have programs saying 'For each row in the database, if the row has this value, then do the following' – instead, the database is given a statement 'Find all rows that match'. That's pretty much the distinction between for loops and filter operations; I'm sure if someone came up with a cool name (like Inversion of Coding) then it would catch on more. To a lesser extent, list comprehensions in languages like Python embody the map-and-filter in an elegant way; but they're not widely used generally, and it's rare to see nested list comprehensions.
Anyway, back to the main point of this post; multi-core is already here, and it's going to get more pervasive. In much the same way that you can't really program in Java without doing object-orientation, we really need a set of tools and libraries that will enable algorithms to be broken down and executed independently of each other. The most logical choice is a functional programming language, simply because they are side-effect free (or in the case of Haskell, guarded by monads). Things that are side-effect free can be parallelised much easier than those without such guarantees. Of course, there still needs to be a strategy for how you organise the 'stuff that needs to be done' with an appropriate execution pool that can 'do stuff', and it's at this point that current languages drop off somewhat.
Maybe what's needed is parallelism at a higher level. Instead of insisting on a functional language through-and-through, it may be possible to do some parts in a functional style but then break that (in a controlled way) in smaller segments. After all, that's what MoM does; the unit-of-work is the message, and how the code processes that message is up to the developer. But in a situation where there's a lot of parallelism, the readers can be ramped up to do work in parallel (assuming that ordering-of-messages is unimportant, at least).
So, is the language of multi-core processes here already? Well, aspects of it are, that's for sure. Ideally, you'd want to have a way of passing somewhat complex data around between components, but not at the extent of having to copy large chunks of memory to make it happen. It's probably also going to be necessary to have some control over the threading pools that do the work, if only to allow the run-time to be tuned for a particular environment. Encouraging a read-only or immutable-style code base is certainly going to help, though copy-on-write techniques can mitigate changes to some extent. Ultimately whatever language is created, it will have to be fairly high-level so that the compiler/interpreter/run-time can make judicious optimisations on your behalf without being concerned with the underlying semantics. It may even be able to run this on the current VMs, although the JVM is currently too tied to Java and the C/DLR are too tied to the Microsoft platform.
Erlang is worth particular mention because although you can use it as a standalone functional programming language, you can also do some interesting parallel processing using message passing. It's very likely that this technique will be used in the future, although it's debatable whether Erlang will be that language.
If you're a Java programmer, Scala is also worthy of special mention. It allows interoperation with Java in a fairly easy way; indeed, Scala is the only one of the second list that will run inside a JVM (the prior set have Rhino, JPython and JRuby at their disposal). Unfortunately, Java is getting uglier all the time; instead of using features like lambda expressions (or as most people incorrectly call them, closures) in a bastardised form, go out and see what they're all about in other languages first.
If you're a C# programmer, your choice should also include F#, which is a functional language designed for running on the CLR, or the IronPython and IronRuby equivalents. It's possibly more exciting in the CLR world, because of the DLR that will allow tighter integration between IronPython and C# data.