The kick-off for QCon London for Day 2 started with the above video and an introduction of the track hosts, which unfortunately delayed the start of proceedings and led to the first session over-running for a bit. I appreciate that a lot of effort had gone into the making of the video, but instead of starting at 08:45 we didn’t start the intros until 09:00, after which everything was off. Not a great start …
The first keynote was Rich Hickey on ‘Simple made Easy’ – mostly pointing out the obvious (such as ‘state is bad’) but really not adding a huge amount of information. Not a great kick off to the day.
Future is Cloudy
There were a lot of Cloud-related tracks today. One of them was by fellow ex-PUE Paul Freemantle on StratosLive, an open-source cloud product. They provide paid-for hosting as well as allowing the open-source to run in private environments, with optional support pricing if interested.
I’d really like to get behind it – after all, they realised they needed to move to OSGi a while ago and are fully modularised – but I really have difficulty in making the leap between something I’d like to use and where I am now. And ironically, this has nothing to do with the actual product and everything to do with the perception of the product.
- The name. StratosLive. It looks like a Greek has vomited over a keyboard.
- The domain. WSO2. Is that Oh two or Zero two? The logo looks worse.
- The colours. Oh, my eyes. It’s a crime against UX. Each module has the company logo and a bizarrely chosen colour.
- The source. Apparently, it’s open source, but they just offer zip dumps. (Well, technically they have a well-hidden Subversion repository, but it’s not exactly prominently advertised anywhere. Oh, and it’s subversion. What is it, 1999?)
Consider instead the much more recently-founded Cloud Foundry. There’s a getting started page and a get involved page with a link to the source on GitHub.
I can’t help but think StratosLive is going to founder unless it drastically redsigns its website and moves to Git (and from there, to GitHub).
Architectures
There have been a couple of good architecture talks today. The best has been a talk on The Guardian’s architecture on how they handle systems and failures. @bruntonspall talked about how they scaled their system and importantly learnt to live with failure. (He noted that a MTBF metric is in fact less important than a MTTR, citing the example that a system which fails every year but has a week of downtime is much worse than a system which fails every five minutes but has a 1ms downtime.
The Guardian’s content management system uses a collection of server-side
includes to build pages, so that widgets (such as the live twitter feed)
can be merged in on the fly with the content about the article. These
‘micro-apps’ are mini widgets that are effectively composed by the top layer
of their stack and then served/cached as a combined HTML file. A multi-stage
cache and failover system performs global distribution (at the top layer)
followed by load balancing in each data center; each of these goes through
an HTTP cache to access content. Thus, the twitter widget may have a recently
rendered view of the state of the world, and instead of the micro-app having
to do a lot of rendering each time, it can delegate and cache results.
The micro-app approach also allowed for a certain innovation to take off, with developer Hack Days providing some new features which wouldn’t have been applicable had it been a monolicthic system. Because each micro-app responds over its own URL, the remote services can be swapped in and out without having to change the main content management system.
There were some additions to their cache processing; stale-if-error
which
permits cached (but stale) content to be served if the remote endpoint errors
(e.g. if the service is transiently re-starting) and a stale-while-revalidate
which permits stale content to be returned whilst checking the validity of
the element. These are documented in
RFC 5861, if not widely
used. (Squid has
stale-while-revalidate
in 2.7 but not 3.x</a>.)
text HTTP Cache Control header
Cache-Control: max-age=600, stale-while-revalidate=30
This notes that the content is considered stale after 600 seconds, but is permitted to serve the cached content between 600-630 seconds whilst it kicks off a background process to revalidate the content. If the content can be revalidated within 30 seconds then the cache never has a miss and doesn’t return an error code to the end user.
Their apps also have a number of flags which can be switched on and off (and in some cases, automatically). These may be used to extend the lifetime of cached data (say, because the back-end database has crashed and the ability to regenerate content has disappeared) or because a sudden spike in load has meant an extra level of delay can be optimised by not having content re-generated. Having automatic processes which monitor the state of the application and turning the flags on automatically can remove the need for human involvement – though debugging and tracing trends in the monitoring and need for flags is still required.
Logs can also help; printing out not only the time, operation and length of time is fairly standard practice – but also printing out the Thread name allows a sequence of operations to be tracked through the logs.
Keeping the system simple and having many independent modules means that the system can scale with the load using standard HTTP caching to reduce the load between components.
Twitter and Facebook
Any talk at QCon about Twitter or Facebook results in a highly packed room. In Twitter’s case, this resulted in an overpacked room with standing going out to door and beyond. Unfortunately the talk finished early (about half an hour early, in fact) but the content was pretty good none the less.
When tweets come in to a user’s timeline, they get written both to that user’s timeline as well as all the users that follow that user. For a small number of users, this scales well; as the user tweets, it fills everyone’s timelines that it needs and permits the lookup for each user to be constant. However, for popular accounts, this breaks down; so for really popular timelines (say, 1m or more followers) a different approach is chosen. Instead of havign a massive fan-out, the users that follow the popular account’s user timeline is tweaked to be a union of that user’s timeline and the popular followee’s timeline. That way, data isn’t duplicated in large volumes, and the merge-upon-lookup is qiuck enough that it doesn’t affect most people. (Also, most people don’t follow that many popular accounts, but just a few.)
The other feature highlighted is the similarity between search and timeline. When a tweet is added, a background process tokenises the word and then adds it to a search index. For timelines, the user’s timeline is a union of one (or more) timelines and then returns with a set. These can be generalised into one process with ‘index’ being the lookup/data and ‘store’ being the database hit. This allows the same functionality to be replicated and new features added (although the #dickbar wasn’t mentioned …).
Interestingly, the communication between the front-end and the services was HTTP, the inter-process communication was done by RPC with Thrift, and Redis to persist the data into the database.
The other elephant in the room was Facebook. Due to room pressures there was a last-minute switch (probably a sensible plan) but the talk was somewhat drier.
There was a passing mention of A/B Testing with some random Facebook mail – perhaps just demonstrating that what people think they like is actually different to what they actually like (hello Twitter, are you listening? #dickbar).
They did have a couple of interesting nuggets in the presentation. One of them was a data-mining/visualisation map which looks a lot more interesting than it actually is (basically the structure of the map is derived from location data of people sending messages coupled with their interconnects; needless to say, people in one location tend to know people in the same location so thus it effectively becomes a way of knowing where people are based).
The other was the growth of FaceBook data in TB (compressed):
- 2007 - 15
- 2008 - 250
- 2009 - 800
- 2010 - 8000
- 2011 - 25000
They also mentioned that they had a couple of data-centre moves in the process and that the move itself was difficult in terms of the volume of data. In the end, large trucks were driven to take the PB of data in servers and then was subsequently sync’d with more recent changes. There’s more information on the moving an elephant blog post on the Facebook website.
HTML5
There have been a number of HTML5 related presentations at QCon so far. I managed to get to “Fast Mobile UIs” which ended up listing a number of useful observations and features – more details will be on the schedule page.
One datapoint I thought about taking out is that iOS is generally about 3 times as fast as Android. Also, the Application Cache is generally a bad idea for real applications (there’s no control over order or priority on the order of elements). However, using an app cache in an iFrame can result in you firing up multiple applications (and the cache benefits therin).
There was some mention of things like onClick
being broken (you can roll
your own with touchstart
and touchend
) as well as numbers being interpreted
as phone numbers (which can be disabled with format-detection telephone=no
).
For higher def mobile devices (such as the iPhone 4 and the new iPad) the rendering of images can be slower when rendering at full scale. Moving to a viewpoint initial scale of 0.5 can speed up rendering significantly – although this needs the images to be twice as big.
London Java Community
The end of Thursday is the open events evening; there’s an Erlang user group (and Joe Armstrong – the creator of Erlang – has given a few presentations this week at QCon) and the London Java Community. Rich Hickey (he of the tedious keynote) was back delivering more blindingly obvious content, but the talk on Netflix architecture (a heavy Eclipse shop and based on Amazon Web Services) was worth waiting for.
The use of AWS means that they can scale as high as 8000 nodes running on a Sunday evening down to 4000 nodes the following Monday at 4am. It was also that they were running out of space on their single node instance and a big database didn’t have any more room for growth, but moving the code over to the cloud meant they could scale horizontally instead.
The migration was piecemeal, with moving the base services off to the AWS cloud
first of all, and then higher and higher layers untill all dependencies were
moved over to be able to serve content from the cloud.
To manage the scale of the system (57 independent cassandra clusters at one point) they have built a number of tools (since open-sourced on GitHub) which can scale up and configure the nodes. Priam is used to configure the Cassandra nodes, handle tokens and provide backup and restore operations. They also have an unpronouncable Java client for communication which provides automatic retry of connections, host discovery and latency and token aware requests.
As for monitoring, they highly recommended App Dynamics as a way of tracing through the steps in an application and being able to find the delays in a single client request through the (instrumented) stack.
Adrian Cockcro has made a number of presentations before and some of the material will be very familiar to those who have seen them before. If you haven’t, they are well worth a look.
Netflix have also started a techblog and that with the open-source drive is both a result of using open-source systems and as a drive to recruit talented developers interested in working on open-source products.
The evening concluded with Azul’s presentation on memory and garbage collection – the same one given yesterday – so I called it a day. Tomorrow: the final QCon day.