Alex headshot

AlBlue’s Blog

Macs, Modularity and More

QCon Day 1

2010, conference, qcon

Yesterday was the opening of QCon in London's QEII conference centre. The conference got off to a great start with the opening video:

The keynote was delivered by the energetic Uncle Bob Martin, called Bad Code (slides). He discussed the 'one screen of code' rule (for the size of the method), but pointed out this rule came from VT100 screens where there were only 24 lines, unlike the hundred lines or so visible on the IDE at any time. His advice was to continue refactor/extract method until it wasn't possible to extract any further.

He claimed that some piles of code are navigated through geographic memory, by recognising the layout of the left-hand column. “Head down to the third mountain, take a left, and then behind that small hill...”. His opening slide/video was of a multi-thousand (million?) source file containing two classes (Foo and FooImpl) to the music of the 2001 “My god, it's full of stars” scene.

He gave a bunch of good coding advice, including:

  • Each method should be less than 20 lines long
  • Name private methods descriptively to help future developers
  • Don't have a function that takes a boolean; that's saying it does two things – create two instead
  • Cut and paste is bug replication by numbers
  • Use test driven design; if you have 80% of your code tested, then you have 20% which you have no idea works or not
  • You should be running home and have gobs of tests running
  • Pair programming is important; but don't aim 100% of the time. 50-70% is the sweet spot
  • If the continuous build is broken, sirens should go off; fix it immediately
  • Design and architecture are important, because we want to maintain and change it
  • QA should find nothing; they should be the people we impress
  • Ask management whether they want good code or crap code; then drive the way you want to
  • Iterations of one or two weeks long are good
  • Despite a long search for blueprint (UML) tools, source code represents the design

There's an increasing focus on non-SQL databases; Geir Magnusson hosted a track on Non SQL databases, including a discussion on Gilt's use of Project Voldemort, which is an open-source distributed database based on Amazon's Dynamo. Their goal was to provide scalability by focus on the key bottleneck, which is usually the shared database instance.

Voldemort works by distributing the content of the data across multiple nodes. In order to manage changes and up-to-date data, each copy of the object has a pairing which contains the last database update and revision number (called 'vector clocks'). So, if an object was last updated on node A, it would have (A,1). If it gets subsequently updated, then it would have (A,2). When an object is written, it's striped over nodes; when it's accessed, local nodes are queried. As a result, each object carries around a set of such clocks (one per node); when a write occurs, it specifies what the last-known-value was with the write. If the last updated value is the same, then it gets applied; if not, the update is rejected. Conflicts do occur, and it's up to the application to determine what the recovery is.

Don Syme (of Microsoft Research) gave an excellent presentation on parallelism with F#, as well as a brief intro to F# for those that aren't up to speed with F# syntax.

F#'s approach to functional programming (and being a commercial success at functional programming, too...) means that integrating parallelism is pretty easy. Using Async.Parallel [ http "www.google.com"; http "www.bing.com" ; ] |> Async.Run, it's possible to hit multiple requests and receive the results asynchronously. To some extent, these have been exposed with utilities like Eclipse's Futures, although the ability to parallelise over arrays (as well as managing how they execute) are important.

He also wrote about some micro-trends in software evolution:

  • Communication with immutable data; REST, HTML, JSON, XML
  • Programming with queries; XSLT, SQL, embedded C# LINQ
  • Pattern matching; Scala, F#
  • Languages with a lighter syntax; Python, Ruby F#

The big trends are are also important; not only multi-core, but multi-system and the web. Parallelism is about CPU computation and IO computing.

F#'s approach to parallelism is to create (re)actors; similar to waiting for a button event or receiving a network socket, a reactor waits for an event and then takes an action. With these reactors, it's not necessary to have a thread-per-actor model, which results in the ability to scale out more widely for many thousands or tens of thousands of actors. This also permits dataflow parallelism, whereby data is processed in a pipeline and each stage in the pipeline can be processed as an actor.

The day finished with an enlightening talk by Dan Ingalls called 40 years of fun with computers. A lot of this was looking back at the development of Smalltalk and related systems; the fact that it was a VM (with minimal C requirements) meant that the engine was self-introspecting (and self developing). A lot of the performance came from a native BitBlt operator for managing the graphics, which resulted in a lot of impressive graphics tools available.

The demonstrations were equally impressive, especially if the period in which it was developed is considered; there was a WYSIWYG editor with the ability to drop images and reflow text (in some ways, in quite odd layouts). No wonder that NeXT was so advanced for its time, having shared similar development history roots (and remains depressing that it was ignored by the masses for so long, and to some extent, by those who still think Macs are a higher TCO).

Dan also demonstrated the Lively Kernel (formerly a research project by Sun Microsystems). Lively was inspired by Squeak, which is an open-source (MIT/Apache licensed) Smalltalk runtime. In fact, his entire presentation was run out of a Squeak VM image.