The second day of QCon was just as enjoyable (and exhausting!) as the first. The scale processes had a number of Twitter and Facebook sessions (whose names drew big crowds) but there was also a lot going on, from building systems with REST to performance and scalability.
Owing to a last minute change, Patrick Copeland stepped in from Google with a keynote on Innovation at Google. This covered a discussion on what makes innovation happen - from 'top-down' innovation, where products are the result of large result labs, to 'enterprenurial innovation' where everyone can be launch an enterprenurial idea.
The fact is that most innovations fail, even if they are well done. Realising this, the goal is to fail fast (if you're going to fail). The example cited was IBM's text-to-speech engine; to determine if the product would be usable, a field study set up a microphone and monitor, but with a human in the other room providing the transliteration. What the study showed, very cheaply, is that people became frustrated by dictating to a computer, especially when errors occur. These pretotypes are the things you create before prototypes, and there's a Pretotyping manifesto which says what you should to do validate (and measure) the ideas before you use them. In some cases, this idea mirrors that of the 24-hour game fests, where teams are expected to come up with a game in a very short space of time. So by faking it before you make it, you can gain measurable data on how it will work, even if the system hasn't been built (or even partially implemented).
FLOPs were defined as Failure in Launch, Operations or Premise. Patrick cited Google Wave as something that failed in the Operations and Premise phases; whilst many people tried it out, very few returned for subsequent use (myself amongst them). By determining that the number of respondents on the initial invite list, versus the number that tried (a relatively high proportion) and the number that came back for several visits (a relatively low number) meant that ultimately, the provision of the service as it was was not going to make it. Conversely, Google Mail and Facebook both have regular returning visitors and are successes.
Measurement of results is also critical. Data without measurement is opinions, and opinons are worth less than ideas; ideas are worth less than innovations. As a data point later on, of the 500m Facebook users, 50% of those log on every day. Clearly, that indicates some kind of success.
Nick Kalen spoke about Twitter's infrastructure and the way that they scaled tweets, timelines and social graphs. Ultimately, the twitter API most of the time boils down to “find by userid” and “find tweet by primary key”. The initial implementation in 2006 was based on a simple Rails/LAMP implementation, but whilst the original tweetbase could be loaded into RAM on a single machine with 800Gb of space, by the time the 3 billionth tweet was sent, the disk was over 90% utilised and growing fast.
To scale outwards, it was necessary to shard the database store (by putting data in a number of different databases). The initial approach was 12 systems, which sharded on the month of the tweet. This gave extra growing room whilst the system could be extended. Although this sounds like it would reduce locality, the API usage is often for recent tweets (those in the same month) so typically the callers only hit one system for the majority of uses.
For timelines, where you get a list of all tweets generated by people you follow, the initial implementation used to be a nested select statement
(select * from tweets where user_id in (select source_id from followers where destination_id = ?) order by created_at desc limit 20). However, although this works in small scenarios (particularly where the indexes fit into RAM) this has fundamental scaling problems with IIOPs for disk seeks.
As a result, timelines are no longer computed dynamically; rather, when a tweet is inserted, it is recorded in the personal timeline of everyone who follows you. This off-line computation may have a small latency but can be done asynchronously and gives a quick result for timelines regardless of number of following.
Some stats: from 2008, with an average fan-out (follower) ratio of 175:1 and 21k tweet deliveries/second, to 2011 with an average fan-out of 700:1 and peak throughput of 4.8m tweet deliveries per second. With a memcache front end, the latency is 1ms for get, 1ms for append and an average user's fanout tweets are delivered in less than 1s.
Perhaps most interestingly, up until very recently (i.e. 2011), there were more profile/social graph changes on an average day than tweets (i.e. following/unfollowing/blocking/spam). In any case, the conclusions were:
- All engineering solutions are transient
- Nothing's perfect but some solutions are good enough
- Scalability solutions aren't magic; they involve partitioning, indexing and replication
- All data for real-time queries must be in memory; disk is mostly for writes
- Some problems can be solved with pre-computation but a lot can't
- Exploit locality where possible
- Measure at point of network calls to derive latency and time metrics
The final presentation of the day was Jason Sobel from Facebook, in which he described Facebook's approach to large scale data systems.
Facebook has over 500m registered users, half of which use the site daily; putting that into perspective, that's about 4% of the human race. All of the data centres are based in the US in both East coast and West coast sites; however, at the moment, the West coast is a read-only replica of data in the East coast. Any writes to pages results in subsequent reads being redirected to the write master to ensure that pages are kept up to date, though a future topology change might result in multi-master sites.
The majority of Facebook is implemented in PHP on top of MySQL with InnoDB back-ends. In order to speed up the access, a memcache layer is used (without which, Facebook wouldn't exist). To speed up PHP, the Facebook team created HipHop, a PHP-to-C compiler.
Much of the data mining is done with Hadoop and Hive, but the core languages remain PHP and C++, with some Java thrown in for data mining purposes.
User group meetings
There were a few user group meetings after the main conference; the London iOS Developer Group was well attended and had some good thoughts about how to pretotype an iPad app for photoshop editing. Some mention was given to the $5 developer tools charge, which some saw as a pithy price to pay for an excellent IDE whilst others were concerned about the general lack of an openly available compiler toolchain regardless of IDE.
All in all, another excellent day. Unfortunately, I missed out on one session as I was deep in conversation with others, and another session was mostly panel-based and so didn't have much to report back on. The other session I attended was Steve Freeman's “Better is Better” talk; and there was similar information content in his slides as his title. It can be summarised as: if you give good developers a good place to work, without admin overhead, they can be more productive. Oh well, time to recharge ahead of tomorrow's final day. You can see what's happening in real time by following #qcon, #qconlondon or by following me at @alblue.