I've written up a piece on InfoQ about what's new in OSGi 4.3. Although the final specification is not yet available, the proposed public final draft is available prior to the official release.
Tuesday, March 29, 2011
Git Tip of the Week: Pushing and Pulling
This week's Git Tip of the Week is about getting changes to and from other sources. You can subscribe to the feed if you want to receive new instalments automatically.
Understanding repositories
A Git repository is essentially a tree of commits, such that at any point in the commit history you have both a full representation of the repository's contents, as well as a back reference to one (or more) parents. Each node in this tree is uniquely identified by its SHA-1 hash (or a unique abbreviation), which is derived from its contents, including the back-pointer to the previous parent(s).
The primary advantage of this model is that two developers, committing exactly the same change, will always result in exactly the same node identity.
The secondary advantage of this model is that when moving content between repositories, you don't need to move the entire tree of commits; you can identify the common root between two trees and just send those contents. For example, if you have two developers whose trees look like:
| Developer 1 | A <- B <- C <- D |
|---|---|
| Developer 2 | A <- B <- E <- F |
then when Developer 1 wants to give his changes to Developer 2, the last known common node is B; so it is only necessary for Developer 1 to send commits C and D. Conversely, if Developer 2 wants to send her change sets to Developer 1, she only needs to send E and F.
Understanding remotes
Remotes allow a developer to track the state of repositories located on remote machines, and provide mechanism to copy commits from one to another. When cloning a repository in the first place, a remote named origin is automatically set up to track the remote repository's state. By default, the operations for copying commits between repositories (push, pull and fetch) all work on the origin remote unless you specify otherwise.
Unlike a centralised version control system, there is no concept of a single central repository. So when copying commits, you need to specify which repository to copy to. If you've set up a shared repository then this will be the one everyone works towards, but a distributed version control system opens up the possibility of many more organisational models which we won't cover in this tip.
A remote is configured with a URL, which takes the same form as that used to clone from. Typically, http and git URLs are normally read-only, whilst https and ssh are used for (authenticated) reads and writes.
Pushing and pulling
For the purposes of this tip, we'll set up a new clone of the repository on a different machine and use it to push to. This can take a number of forms:
file:///path/to/somewhere.git- a local file repository, initialised withgit init --bare /path/to/somewhere.gitssh://host/path/to/somewhere.git- remote file repository, initialised withssh host git init --bare /path/to/somewhere.gitssh://git@github.com/username/repositoryname.git- repository created on GitHub
To add this as a new remote, run:
git remote add github ssh://git@github.com/username/repositoryname.git
Thereafter, you can use the name “github” to refer to this remote source. If you have used a different URL then feel free to use a different name; for example, in an ssh URL then you might want to use the (unqualified) host name. For example, if you wanted to send the code to github, you can do:
git push github
This will take all changes that weren't in the remote repository (on the current branch) and move them up to the remote server. Note that if there has been subsequent changes on the remote you may have a message saying “non fast-forward push rejected” – this just means someone else has pushed their commits before you, and if you were to push your changes you'd overwrite theirs.
The converse of the push operation is either pull or fetch. Both of these will bring down commits from the remote repository; however, the pull will merge those changes into your local branch, whilst fetch will make the commits available for inspection.
# Get the latest changes without merging git fetch github # Get the latest changes and merge them in git pull github
In the developer example above, if Developer 1 was to push his change to GitHub, and Developer 2 were to pull changes into her repository, it would automatically create a merge node which ties the two trees together. A merge node is one which has two or more parent commits; in this case, we'd create a commit G whose parents were D and F.
If Developer 2 pushes her changes back to GitHub (i.e. the merge commit) then this will be available for Developer 1 to pull. In this case, the Developer 1 won't need to create a merge node (since they are already merged at that point) and instead will have a fast-forward merge. A fast forward merge is simply one which moves forwards through a commit history; in other words, going from A to D would be considered a fast-forward merge.
We'll look more at merges and the difference between fetch and pull in the near future; but for now, if you are creating a backup copy of your repository (or simply making it available for others at GitHub) then you now have the tools to achieve that.
Come back next week for another instalment in the Git Tip of the Week series.
Whatever Happened to ...
... the OSGi wiki?
Last year at OSGi DevCon London, a proposal to create an OSGi wiki was roundly supported by those in attendance. Yours truly even volunteered to be the editor of it.
Things happened quickly after that; a mailing list was created, and a template home page was created. When it was created in October, it looked like:
Welcome to the OSGi Community Wiki.
Please see the Contributor page for information on participating in maintaining the wiki.
And here's what it looks like today:
Welcome to the OSGi Community Wiki.
Please see the Contributor page for information on participating in maintaining the wiki.
Wiki means Quickly
Since the wiki is 16 years old last week, it's worth remembering that Wiki is actually a Hawaiian word meaning 'quickly'. Ward Cunningham created a way of editing web pages quickly by allowing the content to be edited in browser, through the use of an Edit button. The first wiki was the WikiWikiWeb, so called because of the Hawaiian phrase WikiWiki (lit. quickly, quickly).
The portland pattern repository was the first wiki, and included the Edit page link prominently displayed, which anyone could use to edit the content of the pages. That, combined with automatic CamelCase linking meant that it was easy to evolve a site instead of designing it top-down. From the welcome page:
Welcome to WikiWikiWeb, also known as Ward's wiki or just Wiki. A lot of people had their first wiki experience here. This community has been around since 1995 and consists of many people. We always accept newcomers with valuable contributions. If you haven't used a wiki before, be prepared for a bit of CultureShock. The beauty of Wiki is in the freedom, simplicity, and power it offers.
This has even evolved to having the worlds most cited encyclopaedia, Wikipedia, being editable by anyone or at any time. As a result, volunteers have given their time, knowledge and effort to populate 3½ million English pages.
This even works for other environments as well; the Eclipse Foundation hosts wiki.eclipse.org, which has been used to crowd-source documentation for features, such as EGit's user guide (including my contribution, Git for Eclipse Users).
Edit or Login
All of this is possible, because each page has an Edit button on the top right. Allow people to edit the page quickly and easily, and they'll make small contributions. Small contributions can build up to medium contributions, and those can collide to become large contributions. The effect is additive; and once a skeletal structure is in place, examples and documentation can build up.
Some wikis – the original C2 wiki and wikipedia – allow contributions from anonymous users, so the Edit button is always displayed. Some require registration first – such as the Eclipse wiki. However, even those that require registration typically don't ask for much (e.g. a Bugzilla account). Either way, a contribution can be up and running in a matter of minutes through prominently placed edit and/or login buttons.
Meanwhile, sites such as StackOverflow and the felix wiki for recording OSGi information. Meanwhile, a specific page on OSGi Tooling has been added to wikipedia (though it has been marked for deletion).
Wither wiki?
These thoughts aren't new. They were raised back in October about concerns with the difficulty involved in following the contribution process, which goes:
If you want to contribute to the OSGi Community Wiki, you will need to agree to the contributor license agreement. Please print out the agreement, complete and sign. Then scan the completed and signed agreement into a PDF and e-mail the PDF to webadmin@osgi.org. Make sure to specify the e-mail address which will be used as your userid when logging in to edit the wiki.
The point is, no-one has wanted to go through this step. It might not be terribly onerous; after all, both Eclipse and Apache have similar processes in place for accepting direct commit rights for the codebase.
But the point is both of those organisations have an easy way for non-committers to contribute fixes; by sending patches. These patches then allow an infrequent contributor to become a frequent contributor to become a committer. We don't have this for the OSGi wiki; it's an all-or-nothing operation. Further, it's completely different from the way other wikis work on the web as well. From Ward's summary above:
The beauty of Wiki is in the freedom, simplicity, and power it offers.
The OSGi wiki is not a Wiki. There is a process through which you can apply to become a committer on a website.
Thursday, March 24, 2011
EclipseCon Thursday
“It's a four day conference” was the refrain at the beginning of the conference, and Thursday is that fourth day. Do make sure you stick around until the closing community panel session; it's just not EclipseCon without finding how many tons of coffee were drunk throughout the week ...
Yesterday's confirmation that Java 7's release date is still July 28th 2011 dovetails nicely in with Eclipse Indigo's general availability a month beforehand, but it's good to see that IBM is working on Eclipse JDT for Java 7, so it should be ready to go. The elephant in the room remains modularity; the modularity will be an important part of Java 8 but IBM is working with Oracle to define “works well” in the context of that modularity solution. We shall see what the state is at next year's EclipseCon ...
The final keynote is on Apache Hadoop, a mechanism to provide scalable distributed computing. Much as the transition to multi core is opening up the possibilities for parallelising tasks on a single machine, parallelising tasks across a farm of machines is becoming necessary to handle the gigabytes and terabytes of data that are being accumulated in run time systems. If you haven't thought outside the (beige) box, then you should certainly find out about Hadoop.
The remainder of the morning is occupied with tutorials; for OSGi fans, there's Gemini and Virgo which show you how to build database-backed OSGi applications; but there's also DSL design with Xtext and friends, which is an increasingly important technology.
It wouldn't be an EclipseCon with some NASA content, so NASA Ames brings the RCP back to life with remote robots. Following on with the space theme, there's Modularity Wars: Episode IV, a New Hope – although it's not clear who the evil empire is in this episode.
Other miscellaneous topics which pique my interest include OSGi friendly bytecode weaving and a couple of concurrency presentations (though fortunately, not at the same time...) on deadlocks and concurrency in Eclipse; if you're more interested in finding out about the state of Orion, there's a couple of presentations on Orion components and services as well as the Orion workspace and server. You have signed up for an Orion Hub account, right?
This year's EclipseCon has felt more compact than previous years, hosted as it is in the ballroom and a few of the rooms upstairs. But the mix of tutorials and presentations has given greater access to those wanting to learn about technologies than in previous years, as well as not feeling that you're missing out with too many concurrent presentations to choose from. I'm also glad to see that many of the presenters are uploading slides for their talks, which always gives a good feeling about the openness of the Eclipse community.
To those organising (and attending) EclipseCon 2011, thanks for all the hard work, including the preparation for the presentations as well as the tweets of those events that I couldn't make it to. Hope to see you there next year! And if a year is too long to wait, then the recently rebranded EclipseCon Europe is just over seven months away.
Wednesday, March 23, 2011
NSConference 2011 Day 3
The final day of NSConference 2011 saw a particularly hung over Scotty introducing the presenters. The night before (with its sumo wrestling and bucking bronco) certainly took it out on many of the attendees.
Making the New Everyday Things
One person who didn't seem to be affected by the night before was Aral Balkan, who bounded onto stage with the energy and charisma of a young Steve Jobs. Aral presented a few amusing anecdotes about everyday objects – such as doors which proclaim “opens automatically” and then don't, and having to reach outside the window to open a door despite warning against leaning outside the windows. Uniquely, the presentation contained video footage of these kind of issues for everyone to appreciate the irony.
His key point was that many devices are commodities (like coffee), but that there are people willing to pay for more expensive devices (again, like coffee) for a better user experience, which can't be commoditised. He presented a unique view of Maslow's hierarchy of needs, in that an app needs to be functional, then reliable, then usable and finally delightful.
He also emphasised that UI is not the same as Experience Design – you need to think outside the screen. His previous app, 'avit, contained a number of minor features which weren't directly obvious but added to the humanity of the app; the latest is Feathers which allows customised tweets to be posted.
He finished with a warning; that the application's experience is only as good as its weakest link. In the case of the FaceBook version of the corresponding Feathers Visage book, changes to the back-end server broke the API with no chance of being able to change it. The age of features is dead.
Objective-C Runtime
Nicolas Seriot opened with [isa kindOf:magic] (cue Queen theme) on how the objective-c runtime works (from objc.h and objc_msgSend (from message.h).
He also gave a few tips, such as implementing -(void)setValue:(id)value forUndefinedKey:(NSString*)key, which by default throws an exception. He also mentioned method swizzling for fun and referenced setting OBJC_HELP=YES for a bunch of fun runtime debugging options in a Cocoa runtime; most of these are displayed in Tech Note 2124 on the Apple developer website.
Finally, he had a few fun tools on his Github page, including RuntimeBrowser, which enables you to find out the state of hte Objective-C world at runtime both on OSX and also on iOS devices, which he used to implement a MobileSignal for tracking the mobile signal strength as an iPhone moves around. Both of these projects are worth checking out; and certainly, dig into the above technote for information that you can get out of a runtime OSX app.
Serious Core Animation
Drew McCormack talked about what is possible using Core Animation, and highlighted his Mental Case app as being an example of what can be achieved. He listed ten points:
- You can draw vectors with Core Animation; but you have to build a recursive
drawIntoContextcall, akin to the View hierarchy - (almost) everything is animatable; if there's a keypath to it, you can animate it e.g. marching ants
- Exploit human imperfection; such as gradients to simulate cylinders as in the date picker
- Include physics; instead of just rotating items on the spot, have them come up and out of the screen as they flip over
- Account for light; since core anmiation doesn't implement light sources, fake it with gradients and pre-chosen background images
- Postpone expensive drawing; or fire it off into a background thread/image, and then set the image when it's done on the main thread
- Cache expensive drawing operations
- Complex layouts have simple primitives; e.g. break down into custom methods (like
UITableView) to make it easier to work with the view - Complex views have their own controllers; this makes it easier to navigate
He has an existing article about some of the thoughts behind the Mental Case app, including code samples referred to during the talk.
Courting Customers
Jiva De Voe gave a lot of good advice on marketing, comparing it to the dating game (and also suggesting this is why nerds aren't good at it). Instead of an “if you build it, they will come” philosophy, customers are like relationships which you have to work on (either through mailing lists or other interactive sites) and aiming for a particular customer demographic.
A lot of it was good advice, which is covered in books such as Permission Marketing, Unleashing the Idea virus and Positioning: the battle for your mind.
Making Really Annoying iPad Apps
Matt Gemmell gave an entertaining close to proceedings with his top ten what (not) to do, including:
- More is Better; keep puttin' controls on until there's no space left
- Be useless; why provide useful functionality?
- Be a dick; yes, the infamous #dickbar got a mention again
- Do it your way; only support a single (and less common) orientation
- Neck injury; suddenly change the orientation for no reason
- Find not found; no point in letting users get to their data easily
- Layout for pain; make sure that there's unnecessary cruft everywhere that doesn't disappear
- Lock the door; don't let data in or out of your app to prevent user leaving
- Speak English; don't localise, and especially don't localise for UK English
- Sighted users only; don't use the built-in tools to make things easy to find
Of course, he pointed out that this is a case of YGOLOHCYSP and that these are things he's found in other (real) apps on the App Store for no good reason.
Wrap up and end of conference
Well, it's been a tiring few days. I didn't get to all of the sessions, so apologies if I missed out taking notes for your session here. The good news is that the videos will soon be available for purchase at if you couldn't get here (and/or for those attendees who didn't get to the session). I highly recommend seeing both Mike Lee's talk on Monday and Aral Balkan's talk on Wednesday when they do come up; they are not to be missed.
Thanks also to those who didn't unfollow me on Twitter due to the deluge of #nsconf postings; and for those that did follow me, I hope you found it useful.
One of the best things about NSConference is the atmosphere. There's a real sense of community that is sometimes missed from the larger conferences; and it's what you can only get for visiting in person. This is my second NSConference, and I'll definitely be back for NSConference 2012 whenever and wherever it ends up – I look forward to seeing you again there!
EclipseCon Wedensday
Wednesday's EclipseCon programme starts with The Java Renaissance will give some interesting clarity as to the newly revamped OpenJDK project. What's interesting about this is not so much the content (though I'm sure that will be good nonetheless) but rather the fact that instead of being a Sun-only (or Oracle-only) talk, this is being co-presented by both Oracle (Mark Reinhold) and IBM(John Duimovich). With Apache Harmony's PMC resigning the view has been on the OpenJDK and what form the rebooted JCP will take. I'm sure whatever the talk contains there will be interesting information about this relationship.
Paul Webster has a couple of talks on Commands in Eclipse 3.x and Commands in Eclipse 4, which should be interesting. The Eclipse command and actions frameworks seems to change over time, so it's always good to find out which one you should be following.
If you've not looked into Virgo before, then Virgo and RT should be an interesting look at how you can build server-side runtimes on top of Virgo (and importantly, what the difference is between that and a vanilla OSGi runtime).
With the advent of web-based editors (such as Orion and E4) being able to debug JavaScript is an important part of that process. The JavaScript Debugged session will give a good idea about what's possible for JavaScript-based apps (which is mainly on the client side, unless you're looking at Node.JS, that is). Speaking of E4, John Arthorne is giving a talk on a busy year for the Eclipse platform on how E4 is going and what's been happening. There's also a What's new in SWT 3.7, which includes some cool new features for OSX toolbar manipulation amongst others, given by Scott Kovatch, who has been a long time OSX developer.
If you already know about SWT then take a look at both Libra and Karaf sessions, which look at other ways of building and running applications on an OSGi runtime.
Finally, the afternoon finishes with a number of tutorials: Apache ACE, which is about deployment across the cloud; GWT from start to finish on how to build a GWT application; scout apps in 2h on the ; OSGi enabled Java EE applications on how to take advantage of the Java apps and OSGi EEG work and Styling E4 with CSS. The only problem seems to be to have too much choice... And don't forget looking at the BoFs – the sign-up borad is in the board in the Hyatt mezzanine.
Tuesday, March 22, 2011
NSConference 2011 Day 2
After last night's meal, there were a few people the worse for wear at NSConference this morning. Still, we were kicked into life with:
Design for Developers
Dave Wiskus talked about design for developers, including the advice that the HIG is the Human Interface Guidelines, and not the Human Interface Gospel. He called out the iPhone's accessibility screen zoom as a good way of inspecting the lower elements of your design on device, rather than on the simulator.
In echoing Mike's comments from yesterday, he mentioned that design was important; and if you have to skimp, ensure that your application's logo is well designed, since that's what a number of people will make an immediate decision on.
He called out drop shadows and colours specifically as things to watch out for. Apple uses a shadow at 90° whereas Photoshop has it at 120°. Also, the use of colours – like red for stop and green for go – are fairly universal and shouldn't be used in a confusing manner.
Over the Air Distribution
Martin Reichart talked about over-the-air deployment using features in IOS, including:
- http://iphonedevelopertips.com/xcode/distribute- ad-hoc-applications-over-the-air-ota.html
- HockeyKit
- TestFlight
- IPA Publisher - to be available on the Mac App store subsequently
His slides are now available; follow his blog for more information about IPA Publisher's availability.
Bonjour Networking
Jonathan Freeman talked about how Bonjour can be used to perform networking between iOS and Mac applications. He bravely ran a demo of BigRaceClient which had a single server and other iPhone applications to have a jockey-like race across the screen. Colin Wheeler won and narrowly beat Scotty to the finish.
The demo also highlighted issues you can have with Bonjour networking; in this case, the server was restarted and clients lost their connections, so the race couldn't be restarted. Problems can include:
- Ignoring the port (it's not always what you expect)
- Ignoring the domain (it's not always
local) - Don't pass
local, but let the OS figure out what domain(s) to use - Handle iOS multitasking gracefully, because a background app will have its connections terminated
Self-sponsored advert; I (alblue) wrote NetBox, which shows you a list of hosts on the local area network using Bonjour. Fun for parties!
Ads and Affiliates
Neil Inglis talked about mobile advertising on IOS and affiliate links, which either way seem to be good ways of monetising otherwise free applications. Some interesting statistics came out; most advertising have low fill rates (i.e. when you request an advert, but none is delivered) – so having advertisers with a high CPM is no good if they don't fill with adverts. iAds is the best for revenue, but they have a poor fill rate and only cover a few countries (US, UK, Canada, France, Germany) so relying on it as a sole provider isn't likely to be useful. Google AdSense and AdMob are the other two advertising networks; but you can use the open-source AdWhirl to dynamically switch between providers for your application based on multiple providers. Mobclix was also mentioned, though not in a good way.
When ads aren't available, consider falling back to house ads, either promoting other applications for yourself or other developers in an ad-sharing kind of agreement.
Positioning advertising in an unobtrusive place is important; away from controls and otherwise accidental key-presses (the dickbar got a mention again) in order not to get banned from advertisers.
Finally, affiliate links are a good way of promoting applications on a site, using LinkShare (for US and Canada) and Trade Doubler (UK and EU). Interestingly, the shares can apply for purchases up to the next 72h after the initial referral link, which can add up.
Aggressive Image Caching
Marcus Zarra announced the availability of the ZSAssetManager library at https://github.com/ZarraStudios/ZDS_Shared, a means to cache images or other large data with low (or high) priority. The cache manages itself (note: if you run out of disk space on an iPad or iPhone device, Bad Things Happen), and depending on the network connectivity may choose to suspend or throttle the background downloads.
This looks to be a highly useful library for anyone wanting to download content in the background with minimal user impact.
Great Customer Support
Daniel Jalkut talked about delivering great customer support, from pre-sales support through point of payment and thereafter. The pre-sales support allows potential customers to gain perspective on the application (as well as discouraging potential problem users before an implicit contract is given) and is just as important as after-sales support.
Support by e-mail allows you to give a 24h guarantee window, though Daniel discourages generic auto-responders instead of a human response. Importantly, support is moving towards other social media, such as Facebook, Twitter and others.
Having built-in support on the app for any crashes is important as well; it's not always the case that customers will go out of their way to report such crashes, so if you know about them you can act on them. Surprise and delight your customers by actually responding to crash reports where an e-mail address is known.
Wrap Up
I missed out a few talks today (Tim Isted's talk on assembly appears to have gone down very well) since I was preparing for my own blitz talk on Gerrit/Jenkins, largely based on my previous blog post except upgraded for XCode use.
The after conference party was very well attended, complete with Sumo wrestling suits and a PS3 game (as well as a bucking bronco; not sure how well that would have gone down after dinner). Part of what makes NSConference so special is not the talks, but the social/community aspect, and NSConference 2011 has certainly delivered in that regard.
EclipseCon Tuesday
Congratulations to the Eclipse Award winners! I'm particularly pleased that EGit project has won the most innovative new feature, though clearly I'm biased. Congratulations also go to the other winners Sebastian Zamekow (top committer), Dariusz Luksza (top contributor), Boris Bokowski (top newcomer evangelist), David Williams (lifetime contribution), e4 (most open project), and of course all of the finalists.
Tuesday kicks off with What is Watson? which looks at http://www.ibmwatson.com/ and how it performed on Jeopardy (@IBMWatson).
There's plenty of good OSGi talks on today; Robert Dunne from Paremus is talking about Distributed OSGi and Remote Services Admin (ECF has recently added Remote Services Admin too). Jeff McAffer and Paul Vander Lei are talking on 10 signs you're doing OSGi wrong, or “OSGi worst practices”, a complement to last year's “OSGi best practices”. The afternoon brings an update on the OSGi EEG and Declarative Services in OSGi. I'm also interested in the incubating Apache Celix, which brings OSGi to C – there's talk later on Celix, universal OSGi?. There's also the OSGi BoF/reception tonight, amongst other BoFs.
There's much more at EclipseCon than OSGi talks, though; These ARE the classes you are looking for on bytecode manipulation is an interesting under-the-hood investigation on how AspectJ and EclEmma works; if you don't know about Orion yet (and you haven't read my InfoQ post yet) then there's a introduction to Orion; and managing open source projects like Growing an open-source project and Donating a mature project to Eclipse will be useful.
Finally, one I'd really like to be able to attend is P2: Saviour or Achilles Heel – provisioning on Eclipse should be far simpler than it is. Witness the plethora of content that's duplicated on Eclipse (the existence of Eclipse Packaging Project is an admission of defeat that provisioning works out of the box). I'm sure there are things that can be done better, but whilst the technical underpinnings of P2 might be appropriate, there's no high-level minimal bootstrap installer that works out of the box and can acquire all additional features. (No, the SWT installer is not what's required, which installs a “fresh eclipse”. We need a minimal Eclipse RCP style application that can isntall features into itself, not into a random new location.) Whether this P2 talk will give hope for the future or not remains to be seen.
There's also a lot of other talks which are worthwhile if you haven't covered them before, such as project coin's changes to the Java language, Mylyn Reloaded, BIRT to the bare metal and many more. The problem, as always at EclipseCon, is that there are too many good choices and not enough time to see them all ...
Git Tip of the Week: Ignoring build output
As the third of a new series of Git Tips, I'm posting a git tip per week. If you don't subscribe to my blog then these are available as a separate feed which you can subscribe to.
Ignoring build output
For compiled applications, it's quite common for there to be a transient directory which should not be put under version control.
Firstly, the output should be repeatable by rebuilding, and secondly, Git (like other VCSs) doesn't compress non-textual data as well as it does textual (program source), since even a small change in the source can result in a large pertubation of the binary. Furthermore, unlike CVCS, you can't have a partial checkout of a DVCS and so you will bring down not only the current, but all previous history if you do.
To hide the build directory from version control, you can use the .gitignore file. This can be placed at any level in the repository, and reach any level below with wildcards. For example, if you have a multi-maven build process, you can do:
# This is a comment # Ignore target at the top level /target/ # Ignore child project's targets as well /*/target/ # And any grandchildren ... /*/*/target/ # Or you can do: target/ # which matches 'target' in any sub folder
You can also just copy the same .gitignore file across all target modules, or even reference them with a symbolic link to the same file. (Under the covers, a file with exactly the same contents will hash to the same value so the space saving is the same anyway, but it may be more convenient symlinking a single file so changes made anywhere are visible in all folders.)
The build folder depends on what system you're using; under Xcode, it's typically build, whilst on other Java projects it might be bin.
You can also use the ignore file to match any other file, such as temporary files (*~) or the OSX turdlets (.DS_Store and ._*).
Recovering from a mistake
It's quite possible that by the time you realise you have added the ignore entry, you already have some files in the repository committed.
If you've just committed, the easiest thing to do is to remove the files, and amend the previous commit.
$ git rm build $ git commit --amend
This will re-write the tip of the history to remove the build, and you can pretend that nothing ever happened. But what if it's an older version or one that has come from a different developer?
Caution: the following will change history. This includes changing hashes which will affect those pushing/pulling in the future. It should only be done with an understanding that this may make things difficult subsequently; there is a trade off between fixing the repository and leaving it as is. However, if the committed built code contains sensitive information (such as passwords or other security tokens) then it may be necessary to do anyway. Remember, history is written by the winners!
You can fix up a tree with git filter-branch. This executes a command on each commit in a branch, and if successful, iterating through until the end. In the case where there's a general folder, like target, it can be easy to fix; but there may also be more difficult cases which are done on a file content basis. In this case, I'm assuming that we've just accidentally committed the target directory. We need a version range to operate on; the good point where we started, and the point where we are now. I'm going to assume they are badc0de and HEAD
$ git filter-branch --tree-filter "rm -rf build" badc0de...HEAD
This ensures that all commits between badc0de and HEAD (which is the current version of the branch) have rm -rf build executed and then re-committed. For commits which haven't changed, you'll end up with the same hash (and so this won't be a problem), but for commits which have changed content you'll end up with a new hash value. Subsequent commits will have a new hash value, whether their content changes or not, since the commit depends on both the contents and the parental pointer.
Other approaches
The .gitignore file is shared with other users via push/pull, and makes sense for general shared settings. However, there is a per-repository entry you can set in .git/info/exclude which applies to all files across the tree.For example, I could have alblue in my .git/info/exclude file, which would permit me to try local test scripts in a location I know would not be committed (or identified to the rest of the world).
The choice of where to put the .gitignore is somewhat arbitrary; you can have one at the root of the repository or one per sub-project (or deeper locations in the tree). If your build is structured in a way that allows one, or a few, build locations to be defined at the top level then by all means just have a single .gitignore at the top level. However, it is often easier to understand why folders/files aren't being added by looking in the .gitignore in the current directory; so unless you have a reason to do so, try and avoid multiple directory references in the ignore files.
Come back next week for another instalment in the Git Tip of the Week series
Monday, March 21, 2011
NSConference 2011 Day 1
This year's NSConference UK kicks off in the same venue as last year, in the DeVere conference centre just outside of Reading. The organisation of the conference and breakout room is better than last year (previously, the blitz talks were occupying a space opposite the bar) but the hotel still seems to have failed in the basics like heated rooms or hot water.
Making apps that don't suck
Mike Lee opened the conference with his “making apps that don't suck” talk – a highly entertaining introduction to viewing the world in a different way. His advice was instead of concentrating on making things great, to make them suck less:
| Instead of this | Do this |
|---|---|
| Steps to make things great | Steps to make things suck less |
|
|
Trism got an outing as the worst iPhone UI ever, mostly for its over use of modal dialogs (aka NSFail)and splash screens “Showing a user a splash screen is like putting a mirror on your date's forehead; it shows her who's the important one”. However, it's important to note that your mistakes are as obvious to others as theirs are to you, so always get someone else to review your app and listen to feedback. Sometimes, things aren't always as clear as they seem, as in the non-handling landing pilot rules:
The Landing Pilot is the Non-Handling Pilot until the "decision altitude" call, when the Handling Non-Landing Pilot hands the handling to the Non-Handling Landing Pilot, unless the latter calls "go-around", in which case the Handling Non-Landing Pilot, continues handling and the Non-Handling Landing Pilot continues non-handling until the next call of "land" or "go-around", as appropriate.
In view of the recent confusion over these rules, it was deemed necessary to restate them clearly.
Getting feedback is important; since feedback is not a chance to prove yourself; it's a chance to prove yourself. If you don't have any complaints. go out and find some. Mike regaled us with a travel woe, this time courtesy of KLM, who went out and found feedback but made the situation worse by replying:
Mike: I hope the 100€ KLM gouged me for my luggage being 3kg overweight was worth the intense ill will I now have for them.
KLM: @bmf Sorry to hear that you weren't aware of our luggage policy. Please check http://bit.ly/BagDiscount and http://bit.ly/bagallow
Other memorable quotes:
- Design is the cheapest awesome money can buy.
- Apple's business model is: surprise and delight.
- Too much implementation detail ruins the experience; never let them see you making it.
- Stop making crap, the world has enough of that already.The crap market is full.
- The harder something is to make, the harder it is to copy.
- You are the only person you can change, and the only person who can change you.
- To make great things, you must first refuse to make things that suck.
- If you spend all your time looking at your competition, your product will look like your competitor's ass.
- Life is a privilege, and may be revoked at any time without warning.
- Life is too short to waste time on things that suck.
Adding VoiceOver to Apps
Dave Addey gave an entertaining presentation on adding voiceover to iOS applications, and of the importance of doing so (2% of the user base has visual impairments). Every iPhone ships with both VoiceOver (which announces widgets as you move between them) and is relatively simple to add – his company's UK Train Times app won the RNIB app of the month and mobile product of the year.
To demonstrate how it worked, he asked the audience to switch on voice over, by going to Settings - General - Accessibility - Voice Over. Immediately, the room was abuzz with muttering iPhones, which was an experience to hear in itself! The worst part was putting down 'screen curtain' (by double-tapping with three fingers) which completely turns off the screen (you can turn it back on again with the same process). That, and screen accessibility's zoom feature are great things to know about when designing applications for voice over, and it's not that hard to do.
Simulating location
Ortwin Gentz talked about how to simulate using Core Location from a Mac by conditional compilation/replacement of CLLocationManager with FTLocationManager (as part of the FTLocationSimulator framework). This wires up the callbacks that CL applies to allow your app to receive a stream of data to give different positions.
The neat approach of this is the use of Google Earth to trace out a route and then export it as a KML file which can generate the data stream for simulation purposes. The data is fed at a fixed rate of points per time rather than average speed, so the cursor will speed up and slow down if the KML points aren't evenly distributed. But it's a good test case to save driving (or biking) around to get data in place.
Crytographic Storage
Graham Lee gave a great intro to cryptographic storage on the iPhone. Although a password (or security code) locked phone will encrypt data stored, this is outside of an app's control (or detection) – so being able to encrypt data on the fly is the only way to be sure that the data is safe. Graham talked about the AESCrypt library, available under an open source MIT license, and the file format for the encryption. He also walked through the source code for how it worked; but if you're reading and writing data, you only really need to know about the encrypt_stream and decrypt_stream to be able to read/write encrypted data in your iPhone app.
Wrangling the Cocoa Text System
Ross Carter gave a blitz talk on the Cocoa text system, including how TextEdit can correctly deal with multiply accented characters which programs like Word and Adobe still find difficult. The key takeaway was that NSTextContainer doesn't actually contain text, but rather defines a visual region in which text can be laid out. He also pointed out another gotcha where the layout isn't guaranteed to be finished when the delegate message is fired; but rather, when the main thread's run-loop has returned.
Digging into Instruments
Colin Wheeler talked about the power behind Instruments, or as he said “Garage band for developers”. The advantage of Instruments as opposed to previous tools (like Shark) is that all of the measurements are integrated into one timeline and can be aggregated/filtered as appropriate.
A few tricks he pointed out in using Instruments:
- You can hide function calls by filtering system calls to just leave your application's calls
- You can charge a function to its calling parent (for example, when many functions are calling an XML parse function; instead of aggregating the XML call, push the cost back to its callers to find where the hot spots are)
- You can start Instruments in a 'headless' mode – provided that Instruments is in the dock, whether running or not, it's possible to gain measurement data – and it's also possible to define a global quick-key to measure it
- You can export dtrace-based Instruments to a
.dscript, which can then run on a customer's machine (without developer tools installed) to collect a log, which can subsequently be opened in Instruments for viewing
Colin is working on a 'zero debugging' app which can collect DTrace information on a customer machine, and will make available subsequently, but not yet ready for prime-time.
Building a better business
Kevin Hoctor, of last year's lickahoctor fame, talked about steps to running a business. Basically, cash is king, so finding a way to fund the version 1 of your app (and getting it out of the door) is a key step in starting to be success. He also pointed out that the successful people are the persistent ones, who keep trying even when faced with failure(s) in the past. There's a lot more information on his blog so I'll leave that as a reference.
Cappuccino
John Fox rounded off the day with a talk on Cappuccino, a JavaScript framework for building applications using a language Objective-J, to give an Objective-C like feel to JavaScript, for example:
@import@import @import "MyClass.j" @implementation Person : CPObject { CPString name; } - (void)setName:(CPString)aName { name = aName; } - (CPString)name { return name; } @end
Cappuccino looks like it has a lot of strengths, and like SproutCore makes it easy to develop web based applications using techniques which may be more useful to those more familiar with Objective-C.
Wrap up
One of the main reasons to come to a conference like NSConference, instead of viewing the videos afterwards or listening to the tweets, is to meet people and socialise. The dinner provided a means to do precisely that, although the pros turned up early on Sunday night to get an extra socialising session in. The only piece of advice I'll offer is not to do so much socialising that you have a headache throughout the next day, which can really make things difficult ...
EclipseCon Tutorial Day
Today is the first day of EclipseCon, and the tweets are coming in on the #eclipsecon hashtag. Today's topics are tutorials this morning, followed by tutorialettes later this afternoon. If you're in to OSGi, there's a technical update on OSGi 4.3, and for those that haven't experienced the joys of PDE build, there's a Building Eclipse plugins with Tycho talk which is highly recommended.
Finally, this is likely to be the last EclipseCon conference where there are more CVS and SVN servers than Git. There's really two types of projects at Eclipse at the moment; those who use Git, and those who will use Git. Partially, that's because the tooling wasn't available at the start of the Indigo release train; but also because the switch mid release is a risk for the larger projects. However, once Indigo is out of the door, there will be a mass switchover for most Eclipse projects to join those already moved over onto git.eclipse.org. If you still haven't got used to it yet, go to the Effective Git tutorial from the developer's mouths. If you already know everything there is to know about Git and EGit, then you might want to check out the Enterprise OSGi talk on Aries.
As if that wasn't enough, tonight brings the Eclipse Community Awards Presentation followed by the birds of a feather sessions, some of which are yet to be scheduled – check the boards for more information.
Friday, March 18, 2011
Red Nose Day 2011
Red Nose Day is upon us, and with it, so are money raising opportunities. Yes, I've got the Red Nose Day tie on (well, it's a red tie with a red nose sticker on it).
To support it, all iPhone applications from my company, Bandlem Limited, will be donating 100% of the proceeds for all iPhone applications purchased between March 18th and March 20th inclusive, including Naughty Step, the graphical countdown timer for children in a time-out corner, BarCodeGen for generating ISBNs for budding self-publishing authors, and NetBox for those interested in seeing what computers are running on the local area network.
If you've been thinking about buying them, now is the chance to get them at £0.59 ($0.99/€0.99) as well as contributing to charity and supporting the needy in Africa.
Thursday, March 17, 2011
Status of MacZFS on OSX
There's been a lot of discussion recently about “ZFS coming back to the Mac” – but in fact, ZFS on the Mac has been ongoing since Apple dropped it a couple of years ago and runs anywhere from OSX 10.5 on PPC through to the latest Lion developer releases.
When Apple backed away from ZFS, a google group and google project were set up to salvage the last publicly released bits and to move them forwards. Even at that stage, the state of the art was far behind OpenSolaris; when OSX 10.5 was released, the latest sync version was onnv_72 and even the latest (non-public) Apple developer bits were somewhere in the July/August 2008 range (pool version 12 IIRC). Meanwhile, the MacZFS project's latest stable release is sync'd with onnv_74 (pool version 8) although there is an unstable version which is sync'd with onnv_78 (pool version 10).
The recent discussions have been kicked off by the creation of TensComplement by former ZFS Apple engineer Don Brady, who is bringing a commercially supported variant of ZFS (sync'd with the latest onnv_147 bits with pool version 28), in part, using the FreeBSD port which has that in testing at the moment.
The future of MacZFS somewhat depends on the TensComplement version. If it is released as open-source, then MacZFS will have done the job of keeping the community alive between when Apple left the project and when Don startetd his company. If it is only released under a commercial license then there is an argument to keep the MacZFS project going; though maybe it is time to follow his lead, abandon the current codebase, and use the latest onnv_147 bits as the starting point. A lot of good porting work has been done on MacZFS to make it easier to port (partially, the reason why the Apple codebase was stagnant for so long is that the project layout was significantly different from the upstream one, which meant patches by copy-and-paste rather than a DVCS merge).
The problem will come if the TensComplement release a version of ZFS for free for commercial use but without source. In that instance, almost all (if not all) of the current MacZFS user base will likely prefer to go with a later version, which will leave the open-source version without a community. I suspect this outcome is unlikely, but if it does, it may mean the death of the MacZFS project. Quite apart from the recency, a company which has been founded to provide full-time commercial support is going to outperform whatever part-time efforts there have been on the community side, no matter how laudable the goal.
So, MacZFS is stable on OSX, albeit with a (very) old implementation; but it's not out of the game. Yet.
Tuesday, March 15, 2011
Microsoft IE 9 released
I wrote a piece on InfoQ about IE9 being released, and I wanted to add some more colour to the point here. It's not that I can actually run IE9 (doesn't run on OSX) and at $DAY_JOB we tend not to get things in the same year as they are released either.
The point about IE9 is that it represents a new dawn for the IE platform. For years, IE has been the poster child of bugginess, only seen when a new version of Windows evolves (and then tied to it thereafter). Browsers like Chrome, Firefox and Safari have been driving the advancement of the web to the point where it's finally starting to get feasible to write applications in JavaScript that execute on the client side, whilst consuming data over RCP or REST from a server, and have it work across all systems.
The big change has been HTML5, now called (confusingly) HTML. This reboot, originally from the WhatWG, brings a number of new features to the client platform, including:
These might not seem like big advances on their own (nor are they the only ones) but the Canvas is going to change how graphical applications (like Google Analytics) present data to the user, possibly even obliterating the ahead-of-its-time Google Chart image generation service.
The History API also seems out of place; why would that make nay difference? Well, the key is what it allows you to do with the URL bar. Historically, browsers have permitted scripts to change the document's URL; but if they change the contents, the page is reloaded in its entirety. If the URL remains the same except for changes to content after a hash (#) character, then the page is not reloaded. This is why “new twitter” and others use http://twitter.com/#!/alblue – the # allows the script to update any parts to the right of the URL without causing page refreshes.
The hashbang was an approach to have a middle-of-the-road solution for Google crawlers. Unfortunately, it didn't work out well for gawker since they forgot to put the Google hacks to get the content to still be rendered. But either way, it was a problem because JavaScript pages couldn't update the URL without causing a page refresh.
The History API solves that problem. This permits a client to change the URL of the site to any value (although the protocol/domain/port is unchanged). Not only that, this “fixes” the browser's back button, so that state transitions in the client cause the page to have an event fired which permits the page to reload the state at the time.
If your browser supports the History API (IE9, FireFox 4, Safari 5) then you'll see a text box you can put in a text field. If you change it and click on the button, you'll see it changing the URL of the page, as well as filling in a 'history value'. Stepping back/forwards through the history with the back/forward button (or arrows) you'll see the URL change, as well as the title of the pages in the history.
At the moment apps like Gerrit use a hashbang approach for representing individual changesets, reviews and such. However, in the future, we'll be using real URLs as provided by the History API.
Git Tip of the Week: Adding content
As the second of a new series of Git Tips, I'm posting a git tip per week. If you don't subscribe to my blog then these are available as a separate feed which you can subscribe to.
Identify yourself
The first thing you need to do is tell Git who you are, so that any subsequent commits record your authorship appropriately. It is possible to do this on a repository-by-repository basis, but it's usually the case you configure this on a global level. (To configure on a per-project level, run inside a repository and without the --global flag.)
$ git config --global user.name Alex Blewitt $ git config --global user.email alex.blewitt@gmail.com
Note that the email is case sensitive, and some systems will be picky about matching the case exactly. If you set this up at the global level,then you don't need to do this step again.
Adding content
Adding content to a Git repository is done in two stages; firstly, adding the content itself, and secondly, committing the changeset.
Like other version control systems, you need to add content to a repository to tell it what is under the repository's control. To do this with Git, you use git add to add one or more files under Git's control:
$ git add file.c file.h docs/
Once files have been added, the changeset can be committed into the repository with git commit:
$ git commit
When a set of changes are committed, you have the opportunity to add comments describing the changeset. Unlike CVCS, the commit message is per changeset commit, rather than per file. If you don't specify it on the command line (with the -m flag) then you'll be put into your editor.
The standard format of a commit message is described in git-commit: conventionally, the first line contains a present tense description of the change, followed by a blank line, followed by more descriptive changes if needed. Conventionally, the last paragraph consists of Key: Value pairs which can be used to record arbitrary information, such as which issue they are associated with, who signed the change off etc. A commit message may look like:
Handle legacy encodings By default we expect UTF-8; however, we should handle cases where it is not. Bug: 123 Signed-off-by: Alex Blewitt <Alex.Blewitt@gmail.com> Change-Id: Ia2dd79e7d8fd33e9940f7eb9cf68ece2cfcf9e2c
The soft limit of 50 characters for the first line is not enforced, but since some commands show just the first line (in particular, git log --oneline) then it makes sense to keep to that level. As with other version control systems, the commit message should explain the intent behind the changeset, not the implementation details behind the change (since that can be derived from the diff anyway).
Adding subsequent changes
One difference from Git is subsequent changes to the files also need a git add. Git has a concept called the index, which we'll cover in a future tip. When you do an add, you're not just adding the file to the repository, but you're adding a specific version of the file to the repository.
To get stated with Git, you can default to using git commit -a. This has the same effect as adding all previously added files, which may be what you expect initially. Running git status will tell you what's outstanding.
Come back next week for another instalment in the Git Tip of the Week series
Sunday, March 13, 2011
Will Google run Circles around the #dickbar?
The internet buzz is firmly focussed on the news that Google may be launching Google Circles later today. All of the reports (including this one) point back to a single data point at ReadWriteWeb, though it gained wider notoriety by being mentioned on Business Insider such that it's actually a twitter trending topic at the moment.
Although this is speculation at the moment, the timing couldn't be better. FaceBook continues its world domination – with 4% of the human population using it daily – but whilst rough-riding over privacy concerns. Twitter, the darling poster child of the social media, has recently infuriated users with the addition of a dickbar, and spawned several imitations, such as DigDog's UIDickBar as well as Mark Beeson's JavaScript dickbar. But to pour (salt|vinegar) into an open wound, Twitter recently decided to change its terms to make it difficult for third-party developers to use the Twitter APIs.
Arguably why Buzz failed as a means of social co-ordination at Google (and why Google Chat succeeded) was that the former was based on a single web client with no general-purpose API to permit other clients to be built or integrate with the system. Google Chat, on the other hand, was immediately compatible with many other clients which meant that a single means of access wasn't forced down the user's throats.
Meanwhile, Twitter has already demonstrated that it is incapable of keeping up with client side development, and is now seeking to follow Microsoft's lead in terms of surviving by extinguishing competition.
As Patrick Copeland admitted at QCon last week, Wave failed because it didn't have a purpose. Although many people tried it out, not many returned or kept coming back, so killing it was the pragmatic decision. Buzz is in a similar situation; many people looked at it (because it came from Google) and whilst it was more flexible in commenting and threaded replying to others, few actually used Buzz directly. Partially this is because the web client sucked; it's also because few used it as a direct system. (Almost all of the followers I follow on Buzz are just tweets from people I follow on Twitter, and with my tweets being recorded on my Buzz bar, the same is likely true on reverse. I only know of two people who use Buzz as a primary means of communication; and of those two, one is a long-time twitterer as well and the other has recently joined twitter.) The other usability fail is that it kept sending emails whenever you posted something – almost as if you didn't know what you were saying – with the result that there's over a million hits explaining how to disable this “feature”
So, what could Google innovate on? Well, to be a success (over and above Buzz), it has to have:
- An open API, probably backed with OpenID/OAuth/OAuth2 to permit additional clients being developed
- A way of *not* having any and all messages delivered to your e-mail, especially when it's from yourself
- Have a way of associating groups of people and defining groups by role, e.g. “public”, “friends”, “work colleagues”, “family” and have a way of switching visibility on a per-message basis
- A way of uploading video/pictures along with text messages, probably from portable mobile devices with cameras
On top of all this, Google almost certainly needs to have a couple of native clients, not just a web client, to demonstrate that anyone can join in. Whilst the web client – like Twitter.com – will remain the 'default' means of accessing, having an updated Google Mobile for iPhones (and similarly for Android) to provide a means of status updates, location posting, image/video capture and uploading will be a necessary part in the puzzle, as well as the API support to permit other developers to play in the game as well.
So, is Circles an incredibly well timed piece of luck? Or is it just a lot of hot air spawned off by a single report? Either way, we'll see if in the next couple of days what the answer is, and how close it comes to fulfilling the requirements above.
Saturday, March 12, 2011
QCon Day 3
And with the final day bringing this year's QCon London to a close, it's time to wrap up another set of presentations and sign off for this series of blog write-ups. Thanks to all my new followers on twitter!
Keynote
The keynote for the final day was Rod Johnson, founder of SpringSource, on the trials and tribulations on starting a company and some of the challenges that they faced. The challenge that I faced was getting out of bed in time to make it in for the keynote, which I failed badly at; but, like Patrick said yesterday, failing fast is always an option. I hear from others that it was vaguely interesting, but since it was a historical look back of what had happened, there weren't any major technical points to take away; though one item I did hear was that they invested too fast and hard in the DM Server and Roo; both of those still seem to be going on comfortably (DM Server became Virgo at Eclipse). Anyway, back to our regular programming
Single Page Apps
Single Page Apps was Michael Mahemoff's talk on the new features and functionality available in HTML5, and in particular, the HTML5 history API and how it affects hashbang URLs.
The problem with hashbang ajax crawling (first proposed on google webmaster blog) is that it changes the URL to effectively be a global single reference e.g. twitter.com/#!/alblue. When bookmarked, the basic twitter.com page will be accessed which won't have any specific reference to specific pages (including any Google crawlers).
The HTML5 history API can help here; using history.pushState() you can push a triple of state,title,path onto the stack. The browser's URL path will be replaced with path, and it will look like a normal page transition – except that the page won't be refreshed as a whole.
Changing the URL is only one part of the situation, though. The app will also have to recover from URL changes. When the URL is changed (whether back/forward button or manual intervention) there will be a window.onpopstate change. This will return the memento associated with it from the history before (or null if the user has typed in the URL manually), which the AJAX application can parse and generate the appropriate data for the call. The example cited was http://rampage.mahemoff.com, which is a pure AJAX application which derives all its data from a single JSON file: http://rampage.mahemoff.com/monsters.json. Interestingly (and somewhat orthogonally to the talk) it was using the mustache templating library, so-called because of the { and } used to delineate the template values. Navigating between pages appears to be a normal website, but you'll note that the top of the page remains the same for all instead of requiring a page refresh, also given away with the cross-fades between images.
A couple of other “just because you can” links - marquee url and url hunter.
Node.js: Asynchronous IO
Stefan Tilkov covered the basic premise behind Node.js as a server-side network server framework (“with a DSL that looks like JavaScript”). By using asynchronous IO – in other words, by never blocking on an IO operation – it's possible to get high throughput whilst consuming minimal memory.
The key is to have one (or a small number) of threads servicing requests from a network source, but only servicing those that are ready instead of having one thread blocked per waiting client. Most operating systems have some variant of select (or in OSX's case, the dispatch_async can be used for any tasks) which allows a function to be triggered if any data is available. Provided that this function creates additional selects in the future (instead of performing blocking IO) then the state of the connection can be maintained independently of the thread servicing it.
Node.js is a set of technologies, such as the V8 JavaScript engine (used inside Google Chrome) as well as libev, libeio, http_parser with some C++ wrapping to provide a JavaScript layer which can invoke callbacks registered by JavaScript programs. When data is available, a JavaScript callback will be invoked using inversion of control; by doing this, a Node.js server is able to sustain a high load in relatively few lines of code.
Stefan has posted some node-samples on GitHub which demonstrate some of the ideas behind the framework. Many additional modules exist, including those to interact with asynchronous requests to MySQL and Postgres databases.
To facilitate ordered operations (where one must complete asynchronously before the next starts) a general step function can be used (yes, like a Monad) to make it easier to chain successive operations together. Since these will all be called back asynchronously, it is possible to parallelise over a data set and have the data members called back with individual elements from the arrays, and then piped into the next farm of functions.
Combined with decent http support, where it's possible to write a (non-caching) HTTP proxy in ten lines of code, and chunking support for streaming back data, Node.js looks like a high performant mechanism of writing services. Note that the key benefits here are twofold:
- Programs are written in JavaScript, which makes it easy to prototype/iterate/test
- Using asynchronous behaviour can be an efficient way of dealing with high load
(It's worth noting that Jetty supports asynchronous continuations in a Java runtime and has many of the advantages that Node.js has. The advantage is in the asynchronous nature of calls rather than the implementation language.)
For aged developers such as myself, it may be tempting to write off JavaScript as “that toy language you first saw in a web browser”. For a number of developers, especially with the mobile web, JavaScript is the first production language that they get used to. Those with longer memories may recall similar disparaging remarks being made about Java in its early days; yet with investment in JITs and runtimes, it has become faster than C. So don't write off a language because of its past, but look to its future.
HTML5 @ Facebook
Facebook do a lot of work in HTML5; for some older browsers, certain features (like Chat) are not available. They have a Using HTML5 today post which explains where it is currently being used.
Some problems exist – like the lack of a video codec standard – but libraries such as VideoJS allow fallback to Flash if standard video support is not available.
For implementing games, you typically need to be able to display 5 sprites at 30fps for a board game, 25 sprites for classic arcade games, 50 for shoot-em-ups and 100 for higher performance 2d/2.5d games. But there's different ways to load sprites, such as divs with background images, divs with foreground images, individual files, sprite maps and so on. To investigate what the performance characteristics of each type of mechanism are, they have a JS Game Bench to see what the effect of each different type of mechanism is.
WebGL, if supported, provides an order-of-magnitude difference for game performance. Some cross-platform APIs – like PhoneGap are available, but there's a list of devices and what they support as well listed via Wurfl and other data.
Secure Distributed Programming on EcmaScript 5
Mark Miller talked about the theoretical and implementation details of SecureEcmaScript. The problem being solved is a combination of adding modularity as well as revocation into the EcmaScript language.
To achieve this, a generic proxy to wrap all interaction between two objects. In normal operation, a function is forwarded onto the wrapped object; but a disconnect (or delete) method releases the contained proxied object and subsequent calls become a no-op.
To implement a module system, not only are public objects wrapped, but also all arguments passed in and return results passed out. Subsequently, any connections the module may have had can be terminated externally, and thus allow the module to be unloaded.
The SecureEcmaScript also provides wrappers around built-in functions (like eval) and replaces them with safe implementations. Since this can be done by running an initialisation script, any subsequent uses of eval can be guarded.
With a suitable module system, and safe eval and other core operations, it's possible to view distributed processes in the same way as disconnected modules.
Finally, the talk introduced futures in EcmaScript or 'promises', which were used in the implementation of the distributed system, including a couple of examples in implementing an infinite queue as well as a mechanism to uses these futures as a means of performing a shared bank transfer.
Whilst proxies are in the latest FireFox 4 Beta, and will be part of a future EcmaScript specification, some of the other parts are not yet production ready and remains an ongoing research project.
HTML5 and the dawn of Rich Mobile Web Applications
James Pearce gave an excellent overview of the state of HTML5, reinforcing a lot of the earlier content. Although the iPhone kickstarted a new generation in 2007, other devices are now in play with an estimated 35% iPhone, 35% Android and 35% RIM in the United States (with Nokia having a greater showing in the UK). Although the initial strategy of building per-device specific applications seemed good at first, the reality is that building a new app for each new type of device is a difficult process.
These days, web applications can be just as good as native applications (with some caveats) for interaction with a user. Just as desktop applications have been superseded in some cases by web based applications, so will some native applications be replaced by web applications on local devices. The web is evolving:
- Documents becoming Applications
- Declarative HTML becoming Programmatic DOM
- Templates becoming APIs
- URLs becoming Arguments
- Request/Response becoming Data Synchronisation
- Thin Client becoming Thick Client
As an example, TravelMate is an app which might be mistaken at first glance for a native app, but it's implemented in HTML, JavaScript and CSS. Not only that, but it can be cached locally – although the data it uses needs to be acquired from a network connection, it can cache previously looked up phrases for later use.
Frameworks take the pain out of mobile devices, such as SproutCore and SproutCore Touch, Sencha and tools like PhoneGap can help make cross-platform mobile devices. Knowledge bases such as http://caniuse.com, http://modernizer.com and http://deviceatlas.com can give information about what a device's HTML implementation supports.
Wrapup
QCon is always a blast. As with any conference, it's not just the content of the presentations (which is great) but also the people that you meet, and the technologies that you're exposed to. The only problem is you feel you don't get to clone yourself three or four times to go and see everything; but even if you didn't manage to see everything you want, a fair number of the presentations are recorded and are released over the year at www.infoq.com for subsequent viewing.
The key takeaways for me this year were:
- HTML5 and mobile web apps are already here
- Keeping things simple in order to fail fast saves money long term; but that doesn't mean keep it like that forever
- Separating clients out from data acquisition services over REST (or similar) API means you can innovate in both clients and services
I hope you had a good time following my blogs and my tweets; feel free to come back here some day, or read my content on http://www.infoq.com/author/Alex-Blewitt.
Friday, March 11, 2011
QCon Day 2
The second day of QCon was just as enjoyable (and exhausting!) as the first. The scale processes had a number of Twitter and Facebook sessions (whose names drew big crowds) but there was also a lot going on, from building systems with REST to performance and scalability.
Innovation at Google
Owing to a last minute change, Patrick Copeland stepped in from Google with a keynote on Innovation at Google. This covered a discussion on what makes innovation happen - from 'top-down' innovation, where products are the result of large result labs, to 'enterprenurial innovation' where everyone can be launch an enterprenurial idea.
The fact is that most innovations fail, even if they are well done. Realising this, the goal is to fail fast (if you're going to fail). The example cited was IBM's text-to-speech engine; to determine if the product would be usable, a field study set up a microphone and monitor, but with a human in the other room providing the transliteration. What the study showed, very cheaply, is that people became frustrated by dictating to a computer, especially when errors occur. These pretotypes are the things you create before prototypes, and there's a Pretotyping manifesto which says what you should to do validate (and measure) the ideas before you use them. In some cases, this idea mirrors that of the 24-hour game fests, where teams are expected to come up with a game in a very short space of time. So by faking it before you make it, you can gain measurable data on how it will work, even if the system hasn't been built (or even partially implemented).
FLOPs were defined as Failure in Launch, Operations or Premise. Patrick cited Google Wave as something that failed in the Operations and Premise phases; whilst many people tried it out, very few returned for subsequent use (myself amongst them). By determining that the number of respondents on the initial invite list, versus the number that tried (a relatively high proportion) and the number that came back for several visits (a relatively low number) meant that ultimately, the provision of the service as it was was not going to make it. Conversely, Google Mail and Facebook both have regular returning visitors and are successes.
Measurement of results is also critical. Data without measurement is opinions, and opinons are worth less than ideas; ideas are worth less than innovations. As a data point later on, of the 500m Facebook users, 50% of those log on every day. Clearly, that indicates some kind of success.
Big Data in Real Time at Twitter
Nick Kalen spoke about Twitter's infrastructure and the way that they scaled tweets, timelines and social graphs. Ultimately, the twitter API most of the time boils down to “find by userid” and “find tweet by primary key”. The initial implementation in 2006 was based on a simple Rails/LAMP implementation, but whilst the original tweetbase could be loaded into RAM on a single machine with 800Gb of space, by the time the 3 billionth tweet was sent, the disk was over 90% utilised and growing fast.
To scale outwards, it was necessary to shard the database store (by putting data in a number of different databases). The initial approach was 12 systems, which sharded on the month of the tweet. This gave extra growing room whilst the system could be extended. Although this sounds like it would reduce locality, the API usage is often for recent tweets (those in the same month) so typically the callers only hit one system for the majority of uses.
For timelines, where you get a list of all tweets generated by people you follow, the initial implementation used to be a nested select statement (select * from tweets where user_id in (select source_id from followers where destination_id = ?) order by created_at desc limit 20). However, although this works in small scenarios (particularly where the indexes fit into RAM) this has fundamental scaling problems with IIOPs for disk seeks.
As a result, timelines are no longer computed dynamically; rather, when a tweet is inserted, it is recorded in the personal timeline of everyone who follows you. This off-line computation may have a small latency but can be done asynchronously and gives a quick result for timelines regardless of number of following.
Some stats: from 2008, with an average fan-out (follower) ratio of 175:1 and 21k tweet deliveries/second, to 2011 with an average fan-out of 700:1 and peak throughput of 4.8m tweet deliveries per second. With a memcache front end, the latency is 1ms for get, 1ms for append and an average user's fanout tweets are delivered in less than 1s.
Perhaps most interestingly, up until very recently (i.e. 2011), there were more profile/social graph changes on an average day than tweets (i.e. following/unfollowing/blocking/spam). In any case, the conclusions were:
- All engineering solutions are transient
- Nothing's perfect but some solutions are good enough
- Scalability solutions aren't magic; they involve partitioning, indexing and replication
- All data for real-time queries must be in memory; disk is mostly for writes
- Some problems can be solved with pre-computation but a lot can't
- Exploit locality where possible
- Measure at point of network calls to derive latency and time metrics
Scaling the Social Graph: Facebook
The final presentation of the day was Jason Sobel from Facebook, in which he described Facebook's approach to large scale data systems.
Facebook has over 500m registered users, half of which use the site daily; putting that into perspective, that's about 4% of the human race. All of the data centres are based in the US in both East coast and West coast sites; however, at the moment, the West coast is a read-only replica of data in the East coast. Any writes to pages results in subsequent reads being redirected to the write master to ensure that pages are kept up to date, though a future topology change might result in multi-master sites.
The majority of Facebook is implemented in PHP on top of MySQL with InnoDB back-ends. In order to speed up the access, a memcache layer is used (without which, Facebook wouldn't exist). To speed up PHP, the Facebook team created HipHop, a PHP-to-C compiler.
Much of the data mining is done with Hadoop and Hive, but the core languages remain PHP and C++, with some Java thrown in for data mining purposes.
User group meetings
There were a few user group meetings after the main conference; the London iOS Developer Group was well attended and had some good thoughts about how to pretotype an iPad app for photoshop editing. Some mention was given to the $5 developer tools charge, which some saw as a pithy price to pay for an excellent IDE whilst others were concerned about the general lack of an openly available compiler toolchain regardless of IDE.
All in all, another excellent day. Unfortunately, I missed out on one session as I was deep in conversation with others, and another session was mostly panel-based and so didn't have much to report back on. The other session I attended was Steve Freeman's “Better is Better” talk; and there was similar information content in his slides as his title. It can be summarised as: if you give good developers a good place to work, without admin overhead, they can be more productive. Oh well, time to recharge ahead of tomorrow's final day. You can see what's happening in real time by following #qcon, #qconlondon or by following me at @alblue.