Thursday, March 25, 2010

Thursday's EclipseCon Highlights

References

Thursday marks the last day of EclipseCon, and as the sun sets over the Golden Gate bridge tonight, we'll all be looking back on probably the best EclipseCon ever (measured by keynotes, at least). Yesterday's keynote, by Jeff Norris of NASA's JPL, has already been voted one of the best keynotes of all time, resulting in a standing ovation at the end. Chris has more details on what I missed.

So, today's keynote by Robert "Uncle Bob" Martin has a big act to follow, but he did an excellent job at QCon with his bad code slides; his presentation at EclipseCon should be just as interesting.

The tutorials are action packed today; if you don't know about distributed version control systems (and you've not read my primer) then Understanding and Using Git for Eclipse is the tutorial for you. You can download the recently-released 0.7.1 and understand both how distributed control systems work, and how you can use Git. this time next year, we'll be blogging about whether we should switch off CVS, SVN or both from the Eclipse infrastructure; so get a head start on tomorrow's technology today. If you already know Git, then there's plenty of other tutorials that might be interesting – such as become a bundle manager, and there's also a sponsored talk by Amazon on AWS and the cloud with Eclipse.

After lunch brings diverse presentations such as Making DI work for you, the Social IDE, Parallel programming in CDT, and Binding EMF to existing data models.

Mylyn is covered today, with The Future of Mylyn (which seems to be a certain success whatever happens; TaskTop Pro, based on Mylyn, was voted Best RCP Application in the Eclipse 2010 awards). However, don't miss out on Mylyn Reviews (ReviewClipse) which looks to be one of two review systems in use an Eclipse in the future (the other being Gerrit, which is based on Git and used by the EGit project). There's also a Code coverage session for later in the day.

The afternoon includes a key session on Migrating to E4, which is going to set you in good stead after the E4 release later this year, as well as new technologies such as Scala Modules and Monkey Punching the JDT.

Concurrently with the EclipseCon presentations, there's an OSGi DevCon Cloud Workshop, which isn't shown on the EclipseCon sessions table. The goal of the cloud workshop is to discuss how OSGi runtimes, and bring those interested in managing and deploying over wide-scale grids.

EclipseCon will finally come to a close with the Eclipse Community Project Spotlight where the E4Rover competition will be announced; the current high score is 5214 by pakerfeldt so that you know what to beat!

I hope you all have had a great time at EclipseCon, and hope to see you there in some future year.

Wednesday, March 24, 2010

Wednesday's EclipseCon Highlights

References

Half way through EclipseCon, and I'm exhausted just blogging about it. Before we take a look at what happened yesterday, here's my top picks for today:

The morning's set of tutorials cover a variety of different topics, from Modular Architecture to UI Test automation. At the same time, there's a brief Q&A for the Oracle Execs on the Future of Java, which would be a great place to pose questions on their commitment to OSGi yesterday. That's followed by a Gemini panel session and EclipseLink/DBWS talk.

After lunch, there's lots more OSGi goodness, but there are several other talks which are pretty interesting. For example, XQuery Development Tools talks about the functionality to do XQuery and XPath development (for XML documents, not to be confused with Xtext which has nothing to do with XML). There's also Groovy-Eclipse, which won won best open source award on Monday.

One of the things missing in the OSGi Enterprise Spec was Composite Bundles (also known as Nested Frameworks), which is prototyped in the Equinox container already. Go to Tom's talk to find out why, and when it will be in an OSGi standard.

A holy grail for some time has been to make Eclipse plug-in development easier; being able to develop plug-ins with JavaScript has been an on-again, off-again goal for Eclipse over the past several releases. But with E4 around the corner, being able to develop JavaScript plug-ins is closer than ever. Go to Tom's talk to find out more.

The day concludes with a few panels: Ask the AC (Anonymous Coward?), Future of Open Source and a bunch of lightning talks, including Doug's Wascana Lives! on CDT for Windows.

And also...

Whilst you're in Santa Clara, did you know you're only a few minutes drive away from the Apple Company Store, which is the only store (worldwide) to sell Apple-branded apparel. If you're a Mac fan, I strongly recommend you take the trip – and if anyone who happens to work near me wants to head there and pick up a large black mug, I'd be grateful :-) Unfortunately, they don't stock ties just yet ...

Recap of Tuesday

So, what happened yesterday probably deserves many blogs in themselves. The biggest was probably Oracle's keynote on the future of Java (more below). Meanwhile, the OSGi enterprise specs were announced; I've written up an analysis of what's new here. Interestingly, the nested frameworks/component bundles didn't make it into the final revision of the draft, which is a shame.

It seems that JavaFX is still a highly funded project (even though it's a dead technology to many), and they want to merge the Oracle and Sun VMs. Arguably, Sun's hotspot was superior to JRockit (and more stable, too) so I suspect they'll keep that part of the name, though JRockit's garbage collection might get integrated. But most of these are so low-level technologies that I doubt that will happen any time soon.

They also want to throw JDK 7 over the wall in the shortest time possible (read: sometime in 2011, probably) since project Coin has a number of new features, the invokedynamic is going to make the JRuby and JPythons of the world more performant, and hopefully even a decent Lambda implementation (more on those in separate blogs). There was no concrete mention of the Apache Harmony/JCP standoff regarding the future of the Java name, which is why it's been called JDK 7 since its announcement. They offered the olive branch to the community, but unfortunately took all the olives off first.

The final aspect Oracle's commitment to make OSGi and Jigsaw play nicely together. Beware the IDEs of March, though; all may not be what it seems. It seems to be based on Qwylt, whose homepage design has been nicked from the Apple Store, and whilst it says “back soon”, Qwylt has actually lasted for some time. From the previous incarnation of the home page:

Qwylt is…

  • APIs for interacting with modules
  • A dependency resolution environment
  • SPIs for building module systems
  • A platform for module systems to share classes and resources
  • APIs for services & dependency injection systems
  • Qwylt is… a module system framework
  • Qwylt is not a module system implementation, and is therefore not a replacement for OSGi or any other existing module system. It is a new environment in which new and existing module systems can be expressed.
  • Finally, Qwylt supports multiple simultaneous module systems—it is a framework for interconnecting modules, within and across module systems…
  • Qwylt is… a module system fabric.

Yes, that's right, it's the ghost of JSR277 rejuvenated. In fact, it says so, right here:

While this project was conceived and created over a very short time, the ideas have been brewing for far longer and have been shaped by many.

This project would not have been possible without the late JSR 277 and the many contributions of its fine expert group members and specification leads. I would particularly like to thank Stanley Ho for his dedicated leadership as well as his patience and thoughtfulness dealing with my suggestions; though the core ideas expressed here derive from my own contributions to 277, many important refinements belong to him. Thanks also to EG members Glyn Normington, Doug Lea, Richard Hall, Michal Cierniak, and Adrian Brock, all excellent collaborators who greatly helped my understanding of the problem domain, and Andy Piper, Daniel Leuck, Gordon Hirsch, Sam Pullara, Hani Suleiman, Brett Porter and Vladimir Strigun for their contributions to 277.

Perhaps this is what's meant by Oracle encouraging open source communication, by doing JCP things outside the JCP as open-source projects, and then just folding them in as needed. Whilst Alex Buckley's comments are specifically that neither JSR294 nor Jigsaw are meta-module systems, Qwylt quite clearly seems to be the case. Maybe it will even support Stanley Ho's concept of largely backward compatible springs to mind again, with his concept of four version numbers (which, incidentally, still seems to be part of the Jigsaw implementation):

So the JSR 277 EDR2 proposes the format of a version number as:

    major[.minor[.micro[.update]]][-qualifier]

where major, minor, micro, and update are non-negative integers, i.e. 0 <= x <= java.lang.Integer.MAX_VALUE. qualifier is a string, and it can contain regular alphanumeric characters, -, and _.

  • Major version number should be incremented for making changes that are not backward-compatible. The minor and the micro numbers should then be reset to zero and the update number omitted.
  • Minor version number should be incremented for making medium or minor changes where the software remains largely backward-compatible (although minor incompatibilities might be possible); the micro number should then be reset to zero and the update number omitted.
  • Micro version number should be incremented for changing implementation details where the software remains largely backward compatible (although minor incompatibilities might be possible); the update number should then be omitted.
  • Update version number should be incremented for adding bug fixes or performance improvements in a highly compatible fashion.
  • Qualifier should be changed when the build number or milestone is changed.

In fact, he demonstrated his lack of understanding immediately afterwards:

The fact that some programs cope with three or even two numbers is not a prima facie reason against four. The OSGi policy of three-number versions where the third is for bug fixes (OSGi R4.1 3.6.2) leaves only two numbers for the mainstream product version.

The key thing which eluded Stanley – and continues to elude those on the Jigsaw team – is that there should be no relationship between the “mainstream product version” and the module version. In fact, it's a rookie mistake to assume this. (Everyone makes mistakes: Eclipse re-numbered all of its bundles through the 3.1 release with a 3.1 prefix (e.g. org.eclipse.core.boot_3.1) even if they didn't need it; however, by 3.2 they fixed the renumbering of everything, which is why you now have _3.2 and _3.3 bundles in the runtime – they've not changed since their initial release.) However, smart people learn from other people's mistakes rather than making the same ones again themselves.

I've made this point before that the module version should be independent of the marketing number of the system being released; sadly, if there's one group of people who understand version numbers less than Sun, it's Oracle, so I'm not holding my breath hoping for a good resolution here. Time will tell, but let's conclude with the commitment to OGSi being good but that the implementation will tell.

Tuesday, March 23, 2010

What's new in OSGi Enterprise 4.2

References

I've written up what's new in OSGi Enterprise 4.2, recently released at EclipseCon over at InfoQ for those that are interested. Also, since the DZone moderation system seems to be getting in the way, instead of posting the link directly to the InfoQ page, I've had to point it here first. Please send complains via DZone, not here.

OSGi Enterprise spec 4.2 released

References

The OSGi enterprise spec 4.2 has been released and is available for download. This brings the ability to run web apps inside an OSGi runtime, as well as access enterprise level services like JNDI and JPA in an OSGi-compliant way. More analysis to follow later ...

Tuesday's EclipseCon Highlights

References

Hopefully everyone had a great day yesterday at EclipseCon – judging by the amount of traffic on the #eclipsecon twitter tag, everything is great!

I'd also just like to congratulate the Eclipse Awards Winners for their continued enthusiasm for the Eclipse community:

In addition, I'd like to also congratulate the EGit and JGit developer teams for the first public 0.7.1 release via the new Eclipse update site. If you're new to Git (or DVCS in general) then I've written a primer on the concepts to help make the transition smoother. Please note that EGit is in incubation right now, so is not as tightly integrated as CVS is today (yet)! There's a EGit presentation which should be a great intro for those who don't know.

For those of you that need a map around today's activities, here's a guide to what to look for.

After the keynote, in which Oracle will talk about the future of Java, there's a healthy choice of tutorials. Getting Started with Eclipse RT will be a great tutorial for those who are new to the OSGi platform, but it also covers using Jetty to build JSF pages on top of JPA-backed persistent entities. The new support for JPA in the OSGi enterprise group is great; think of it as Hibernate on steroids. If you don't need to build enterprise applications, then Developing with Maven 3.0 will show you the future of Eclipse-based builds, followed by an overview of Nexus Pro and P2 for those that need to manage repositories.

After lunch, there's some interesting presentations on E4 CSS and E4 Styling, which will be of interest to those looking to the next generation platform. For those developing in C, there's a What's New in CDT (which sadly doesn't include Objective-C yet :-)

Those with an OSGi-oriented mind should not miss Robert Dunne's Next Generation OSGi Shells which will demonstrate Paremus Nimble. Having seen it in action, it's the missing link between Maven and P2. There's also (concurrently) a Tycho Build Workshop which appears to run for most of the afternoon – presumably, as a drop-in, drop-out centre rather than a continuous presentation – so if you missed the morning's content, stopping in here would be prudent. I'd also like to recommend the OSGi Best and Worst practices, since there's a few common mistakes that are easy to avoid when you know about them; and if you missed out on the EclipseRT tutorial earlier (and aren't going to the EGit presentation) then there's a Using JPA in OSGi talk.

I mentioned yesterday about Xtext; who then went on to win the most innovative new feature. If you still don't know what it is all about, I can strongly recommend Xtext - A Language Development Framework.

The day concludes with a few panels: The Future of App Servers; Developing at Eclipse and Build and continuous integration with Eclipse. All are great panels; it's a pity you can't go to all three.

Finally, for those interested in participating in the E4 Rover Mars Challenge, the top score can be monitored via the @e4rover twitter feed, or the #e4rover hashtag. Currently, the high score is Kai with 3051; sadly Don's Chuck Noris Client's infinte score doesn't count, as he's a Foundation employee. Good luck!

Monday, March 22, 2010

Monday's EclipseCon Highlights

References

I'm a bit grumpy that I can't attend EclipseCon this year; it looks like the programme is excellent. Today's programme contain a smörgåsbord of sessions that really makes you wish you could clone yourself to attend multiple.

If you're in early enough, there's the E4 Mars Rover Challenge at 08:15, which fires the starting pistol towards the E4 Rover Mars Challenge. Not only do you get to play with Lego™ virtually, you also get the chance to win to go to NASA's robotic's lab in Los Angeles. Sadly, you must be a registered attendee of EclipseCon to compete, so I can't take part remotely :-/

Xtext meets E4 will likely be a great tutorial; these are both up-and-coming technologies for the Eclipse of the future. For those that don't know, Xtext (not XText or xText, as Sven will point out frequently) is a tool for building custom DSLs. Combine this with E4, and you have a great way of building an editor. Those looking to get used to OSGi will find Working with OSGi: Stuff you need to know a great starting point for the conference.

After lunch, there's more Enterprise OSGi goodness in the form of Apache Aries, Eclipse Gemini and Eclipse Virgo. If you're not interested in OSGi (why not?) then there's some good talks on Modelling and even Blackberry development.

You really should go to the afternoon's session on OSGi with Tycho, Nexus and Hudson, though. Tycho is the Maven builder that can work from PDE's Manifest-first development model (though it can also work with POM-first) that is slowly taking over from the archaic PDE build. If you're looking after an OSGi project, and have had issues with PDE build, then join the projects using Tycho and build with confidence.

Finally, if you're thinking of entering the Mars rover competition, there's a lightning talk on Lego Mindstorms programming followed by a top ten things learned working on Eclipse which should be a great summary of life in an open-source project.

Lastly, there's also many birds of a feather meetups (aka BoFs) where like-minded individuals go to talk about topics that are on their minds. Sonatype (developers of Maven and Tycho) are sponsoring the OSGi DevCon BoF, though this overshadow's the Eclipse Virgo community launch which is a shame. If neither of these appeal, there are several others - or you could participate in your very own Unconference by putting your name/subject up on a post-it note at 5:45 on the Unconference board at the mezzanine level.

Friday, March 19, 2010

Eclipse and Git

References

It's been a while since I wrote my Git for Eclipse users post. (It's now been uploaded to the EGit Wiki, so hopefully it will appear as part of the user documentation as well.) Thanks to those of you who reached out to me, directly or indirectly, with feedback. One of the recurring themes of the comments was regarding the lack of a central version control system:

  • There is no master repository

It's an implicit property of a centralised version control system that there is a master repository. In a distributed version control system, each user has a full copy of the repository; so theoretically, any user can be the master.

However, in practice, there's usually a de-facto master repository. Most of the time, this is the one that is coupled to the build system; after all, there is usually a central build system generating released artefacts, even if the version control system could be from anywhere.

In a corporate or large organisation, it's also fairly likely that the build system (and local storage) will be on some kind of backed up server, connected to RAID backed drives and other redundant server configuration. In fact, this isn't that different from an existing centralised version control system, where the code is hosted from a backed up location.

So, just because you're using a distributed version control system doesn't mean that you can't have a centralised repository. In fact, it often makes sense to have one; that is where the developers synchronise their code to, build and release from, and acts as a public log of whatever has happened to the repository.

In EGit's case, there are many copies of the repository. I have a few local copies; there are many co-ordinated via Gerrit; others exist on GitHub, but ultimately there is one place where all changes end up: git://egit.eclipse.org/egit.git. That's what's used in the build server, and ultimately, the http://download.eclipse.org/egit/updates/ update site. You should be able to download 0.7 in time for EclipseCon.

So, while there may not need to have a central repository, in practice, the one connected to the build system (and hosted on redundant/backed up hardware) is often the canonical repository.

Long class names in Eclipse

References

Here's a fun fact. Eclipse has some seriously long class names in the bundles that come with a run-of-the-mill install.

The longest single class name is org.sat4j.pb.constraints.CompetResolutionPBMixedHTClauseCardConstrDataStructure, or 54 characters of goodness. If you include inner classes as well, then JDT's ConfigureWorkingSetAssignementAction$WorkingSetModelAwareSelectionDialog$GrayedCheckModelElementSorter (102) and JDT's IntroduceParameterObjectWizard$IntroduceParameterObjectInputPage$ParameterObjectCreatorContentProvider (102).

Perhaps these classes were implemented before the refactoring support was finished? In the words of Uncle Bob Martin, “What were they thinking ... ”

Friday, March 12, 2010

QCon Day 3

References

The third and final day of QCon didn't kick off with a keynote, which meant that the conference hall felt pretty empty just before the talks. I wonder if there would have been more traffic had Dan Ingalls evening keynote on Wednesday had been moved to Friday morning.

Joe Armstrong was first in the concurrency track, giving Message Passing Concurrency in Erlang as a talk. It started with an observations that there are many ways of achieving parallelism:

  • Message passing concurrency (perfect for agent-style behaviour)
  • Purity and control of side effects (using types to preserve the state)
  • Traditional languages (for those with a real job)

He noted that the wrong abstraction leads to issues with the representation (and therefore solution) of the problem; for example, XLVIII*XCIII = MMMMCDLXIV is a lot less open than 48*93 = 4464.

The key problem with shared memory and fault tolerance is that you have no idea why (or even if) one of the other components in the shared memory system has crashed. So Erlang's approach to parallelism is to simply forbid shared memory; after all, this works for systems connected across the internet where there is implicitly no shared memory.

Erlang's message passing semantics means that the caller and server are isolated, and that no data is shared. He even called up a quote from Alan Kay (father of Smalltalk) saying “I'm sorry I coined the term objects; It's all about messages”.

One great thing about conferences is that you can go up and meet the speakers afterwards, or indeed, during Friday's “meet the speaker” session. I was especially interested in Erlang's approach to distributed functions, and the ability to send functions remotely; if the remote server has all the definitions of dependent functions already, this approach is seamless. If not, the remote server has to load them; and apparently, the assumption is that the Erlang node has an up-to-date copy of the dependent functions which it can resolve locally if needed. Erlang is really a powerful language, and if you need to learn a new language this year to keep your grey matter ticking over, Erlang should be it.

The next standing-room-only presentation was Rod Johnson discussing Spring Roo. This is a productivity enhancing tool for writing Spring-based applications; and whilst Spring lets you configure anything you want in any way you want, Spring Roo is more constrained in that it follows convention over configuration. It allows you to put together JPA-backed persistent entities (like Ruby on Rails or Grails) using nothing more than a few annotations and some generated code. Unlike other approaches (EJB, DAO) the persistence functions are weaved into the data object itself; unlike other approaches, this is done at compile time using aspects and weaving.

Spring Roo seems to be very useful at prototyping persistence layers. In addition, a few clicks later saw the creation of a web-based interface for CRUD operations on the data set, with virtually no coding support needed to get it up and running. Hopefully the code sample will be available from the QCon website in the future (in which case, I'll link back here); in the meantime, more information can be found at the Spring Roo website.

As a follow up question, I asked what the generated code is like for interacting with OGSi. Fortunately, each Spring Roo project is its own OSGi bundle, and the JPA and Spring autowiring can take advantage of OSGi services, so it seems eminently possible that you'll be able to use Spring Roo as a way of generating OSGi-aware persistence layers. As I noted on Twitter, this may be the end of web-based CRUD outsourcing as we know it...

The penultimate talk was Ralph Johnson on Pattern languages for parallel programming. They are working on a wiki to catalogue parallel algorithms (available at ParLab) and cover the basic approaches to parallelism. There are a number of common concerns, like how to size the units of work such that the data overhead doesn't subsume the data itself; but most of the information you can find from the wiki.

The day (and conference) finished off with Alex Buckley, Kevin Seal and myself discussing the future of modular Java in front of a live studio audience. We got across the basics, but unfortunately Kevin's war stories (coupled with the fact that they rolled-their-own build tool, rather than using maven-bundle-plugin or bnd or bundlor) probably gave the wrong impression that modularisation is either difficult, unnecessary, or irrelevant. I'm not sure those that came who didn't know about JSR 294, OSGi or even Jigsaw left any better than they started. It didn't help that there was an event discussing the future of OSGi for the Enterprise at the same time, which meant that the number of people attending wasn't that high.

Conclusion: Bar minor timing clashes (which are always a possibility at such events), QCon was a resounding success. The main goal of such events is to round out thinking on subjects which may be slightly further afield, and QCon avoids the trap of being too vertically integrated to a tool/language. The diversity of the speakers and topics, coupled with technical and non-technical presentations, results in a well-rounded conference that was aimed squarely at the technical architect level instead of the developer. I'd highly recommend going to the San Francisco one; to get a feel of what they're like, you can browse the presentations or catch up on the pre-recorded videos from past conferences. Hopefully, the 2010 videos will come out over the remainder of this year.

Thursday, March 11, 2010

QCon Day 2

References

Day 2 of QCon opened with Ralph Johnson on “Living and working with Aging Software”. The key takeaway was that as software ages, it gets brittle (hard to change as well as hard to understand) but that even though evolution occurs, there are likely some key nuggets which remain in the source for a long time to come. He cited the history of Word, in which it evolved from 1983 to today's Word 2010; there's a good chance that a number of users are younger than the source code of the application's key components.

As a result, there's a lot of “maintenance work” on software – he argued, anything since the first release is maintenance – but that the stigma of maintenance means that most don't consider themselves to be software maintainers. He did point out that as software ages, so does the knowledge base of what that software does; and expertise in the software tends to wane over time as turnover and lack of documentation occurs. His takeaways were to focus on the investment made in older software by continuing to employ older developers, as well as documentation and reverse engineering skills.

His hypothesis – that the average age of code gets older – doesn't match up with Kirk Knoernschild's observation that source code doubles every seven years; if true, that would argue that the average age is in fact getting younger. However, the base premise – that there exists code which is getting more ancient over time – certainly holds true.

He also referred to refactoring, and a short history of where it came from, as a means to keep software 'live'; though his comments on “what you wish your editor would do for you” seems to bely an ignorance of the state of most modern IDEs which, in fact, do that for you.

Next up was Jim Coplien's talk on DCI, which he's talked about on Artima before. He was certainly entertaining (if slightly provocative). He started by defining architecture as the essence of structure, or “form”, but noted that structure obfuscates form. He noted that the object orientation paradigm is just a way of thinking about form that captures the user's mental models in code; citing the Model-View-Controller as the vision of object orientation.

However, his thesis is that architecture is more than that; it includes the form of the business domain (what the programmer cares about, domain model in MVC) as well as the form of the system interactions (what the system does, what the end user cares about). His argument states that object oriented developers focus on the first more than the second, and expect it to evolve like the 'game of life' from the state of the system.

He used as an example the classic 'bank transfer', in which he argued that a user doesn't think about the objects, but the roles in which the objects play. In other words, a user may have many kinds of accounts (bank account, savings account, credit card, telephone bill, mobile phone credit ...) and rather than multiple objects, the user considers one to be a 'source account' and one to be a 'destination account', but which may change over time (or even reverse roles for the same objects).

His argument stated that this requires a new architecture; classes, interfaces, and “methodful roles” (also known as “traits” and similar to Spring Roo's use of aspects). The argument goes that a class may represent the object itself, but that it may be extended with a trait or aspect to fulfil different roles through the object's lifetime.

He stated that “Java is a toy scripting language; you cannot program object-oriented code in Java” – though this was a semantic sleight of hand. With other languages (C++, JavaScript etc.) it's possible to alter the state of the object at runtime; so one can have “BankAccount with SourceAccountTrait” on a single instance (rather than all instances statically). He argued that this implied Java was a class-oriented language.

The summary of the DCI architectuer was that it's possible to think about the objects/classes (BankAccount) separately from the roles that they play (SourceAccount). Furthermore, this implies that future roles can be added onto existing objects afterwards; although dynamic languages like JavaScript will permit such changes on an instance-by-instance basis, compiled languages like Java make it more difficult to change things at run-time (though arguably aspect-oriented compilation is a way of extending Java's classes with this).

He ultimately argued that this was the true agile architecture, since it could be used to share customer vocabulary, whilst exposing the parts that change as a way of allowing items to be amended over time. There's more information on The DCI Architecture, as well as a PDF on the subject, and a Baby IDE embracing the change. Whether you agree with his conclusions or not (and, I hasten to point out, I have tried to reflect his points here in a detached way; not that I necessarily concur with all of his observations), it was certainly a thought-provoking talk. And isn't that what we all attend QCon for?

I next went to Rebecca Parsons talk on “How to avoid "We never thought of that"” on multi-disciplinary teams and innovation of the same. One trap is the common vocabulary in which an existing team talks of acronyms which can be completely incomprehensible to new joiners or members of other interdisciplinary teams.

The argument is a good one; if you have people from different teams then not only is it more likely you'll have different kinds of team players, but also you will have people that don't have the same built-in assumptions about how the world should work. It's very easy to have a fixed definition and set of assumptions (e.g. the way that Java's object-orientation works implies a certain must-be behaviour for other languages; otherwise, where would Scala's traits have come from?) and that diverse teams can have this.

The other point made was that even expert opinions can be wrong. For example, using genetic algorithms to evolve a wing design resulted in things which would never have been invented directly, and in some cases, couldn't initially be explained either. So expert-guided solutions may lead to minima, but only optimal locally; there may be other solutions further afield that are more efficient which, unconstrained by assumptions, may result in a better solution.

Alex Buckley delivered an interesting talk on The Universal Virtual Machine in which he described how optimisations in the VM over the last decade have resulted in programs that are almost as fast as, or in some cases faster than, their natively-compiled counterparts. Ironically, the very reason for Java's perceived slowness was due to it's VM; but now, the very use of the same VM structure has resulted in Java's gains over the years.

He showed how the JIT can in-line a method if statically the types are known, and that dynamic knowledge can show certain optimisations are safe (but can be backed out later if needed). Most of the speed increase, he argued, is due to method-based in-lining which the other optimisations can support.

He also talked about the new invokedynamic instruction, which promises to give a real speed boost to languages like JPyton and JRuby. Furthermore, this potentially adds the ability to extend a runtime's type with new information after the fact (though still in a class-based, rather than instance-based way). It could also be used to define optional and default parameters to methods. The way that the invokedynamic is wired is by consulting a language oracle that says whether the transformation is good for future uses. If it is, the JIT can get to work and use that information to implement further calls. The language can implement hooks to break or replace existing methods to support further evolution.

Although there's still no timeframe for whether (or what) will be part of Java 7, the invoke dynamic is part of the OpenJDK experimental builds.

Dave Farley and Martin Thompson gave a talk on LMAX, which included how they got their application to process 100k transactions per second with < 1ms latency. Their argument was that Java, by its abstract nature, can hide the implementation details of the underlying processor; and that by paying attention to those details (and being hardware friendly) one can get insane rates of transactions. The biggest cost is a cache miss; so by optimising the data structures and algorithms to be cache-line friendly (e.g. padding to 64 bytes to fit in one cache line) and reducing the amount of garbage created (e.g. reading from a stream into a constant buffer, then using a ring-based buffer to write data into fixed sized arrays, instead of having to re-create or cycle new byte array buffers), one can go at the native speed of the processor. Their architecture involved entirely asynchronous processing threads (so no synchronous delays) and a single 'master' thread reading through completed jobs in order to send to downstream systems. Because single threads were being used, no locks were needed, and because the majority of the data fits into the L1 and L2 caches of the processor, main memory did not need to be consulted (for reading purposes). They also observed that if writing data to disks in a streaming format (i.e not random access) then it's possible for large rates of writes to occur; disk is the new tape. Unfortunately, their slides are not available for download, but their room was packed to standing room only and the event was videod for subsequent production on InfoQ.

The day finished with a panel by the JavaWUG user group, in which three panelists talked about the future of Java. The event was streamed over the internet, but it's not clear whether the recording is available for off-line use yet.

Wednesday, March 10, 2010

QCon Day 1

References

Yesterday was the opening of QCon in London's QEII conference centre. The conference got off to a great start with the opening video:

The keynote was delivered by the energetic Uncle Bob Martin, called Bad Code (slides). He discussed the 'one screen of code' rule (for the size of the method), but pointed out this rule came from VT100 screens where there were only 24 lines, unlike the hundred lines or so visible on the IDE at any time. His advice was to continue refactor/extract method until it wasn't possible to extract any further.

He claimed that some piles of code are navigated through geographic memory, by recognising the layout of the left-hand column. “Head down to the third mountain, take a left, and then behind that small hill...”. His opening slide/video was of a multi-thousand (million?) source file containing two classes (Foo and FooImpl) to the music of the 2001 “My god, it's full of stars” scene.

He gave a bunch of good coding advice, including:

  • Each method should be less than 20 lines long
  • Name private methods descriptively to help future developers
  • Don't have a function that takes a boolean; that's saying it does two things – create two instead
  • Cut and paste is bug replication by numbers
  • Use test driven design; if you have 80% of your code tested, then you have 20% which you have no idea works or not
  • You should be running home and have gobs of tests running
  • Pair programming is important; but don't aim 100% of the time. 50-70% is the sweet spot
  • If the continuous build is broken, sirens should go off; fix it immediately
  • Design and architecture are important, because we want to maintain and change it
  • QA should find nothing; they should be the people we impress
  • Ask management whether they want good code or crap code; then drive the way you want to
  • Iterations of one or two weeks long are good
  • Despite a long search for blueprint (UML) tools, source code represents the design

There's an increasing focus on non-SQL databases; Geir Magnusson hosted a track on Non SQL databases, including a discussion on Gilt's use of Project Voldemort, which is an open-source distributed database based on Amazon's Dynamo. Their goal was to provide scalability by focus on the key bottleneck, which is usually the shared database instance.

Voldemort works by distributing the content of the data across multiple nodes. In order to manage changes and up-to-date data, each copy of the object has a pairing which contains the last database update and revision number (called 'vector clocks'). So, if an object was last updated on node A, it would have (A,1). If it gets subsequently updated, then it would have (A,2). When an object is written, it's striped over nodes; when it's accessed, local nodes are queried. As a result, each object carries around a set of such clocks (one per node); when a write occurs, it specifies what the last-known-value was with the write. If the last updated value is the same, then it gets applied; if not, the update is rejected. Conflicts do occur, and it's up to the application to determine what the recovery is.

Don Syme (of Microsoft Research) gave an excellent presentation on parallelism with F#, as well as a brief intro to F# for those that aren't up to speed with F# syntax.

F#'s approach to functional programming (and being a commercial success at functional programming, too...) means that integrating parallelism is pretty easy. Using Async.Parallel [ http "www.google.com"; http "www.bing.com" ; ] |> Async.Run, it's possible to hit multiple requests and receive the results asynchronously. To some extent, these have been exposed with utilities like Eclipse's Futures, although the ability to parallelise over arrays (as well as managing how they execute) are important.

He also wrote about some micro-trends in software evolution:

  • Communication with immutable data; REST, HTML, JSON, XML
  • Programming with queries; XSLT, SQL, embedded C# LINQ
  • Pattern matching; Scala, F#
  • Languages with a lighter syntax; Python, Ruby F#

The big trends are are also important; not only multi-core, but multi-system and the web. Parallelism is about CPU computation and IO computing.

F#'s approach to parallelism is to create (re)actors; similar to waiting for a button event or receiving a network socket, a reactor waits for an event and then takes an action. With these reactors, it's not necessary to have a thread-per-actor model, which results in the ability to scale out more widely for many thousands or tens of thousands of actors. This also permits dataflow parallelism, whereby data is processed in a pipeline and each stage in the pipeline can be processed as an actor.

The day finished with an enlightening talk by Dan Ingalls called 40 years of fun with computers. A lot of this was looking back at the development of Smalltalk and related systems; the fact that it was a VM (with minimal C requirements) meant that the engine was self-introspecting (and self developing). A lot of the performance came from a native BitBlt operator for managing the graphics, which resulted in a lot of impressive graphics tools available.

The demonstrations were equally impressive, especially if the period in which it was developed is considered; there was a WYSIWYG editor with the ability to drop images and reflow text (in some ways, in quite odd layouts). No wonder that NeXT was so advanced for its time, having shared similar development history roots (and remains depressing that it was ignored by the masses for so long, and to some extent, by those who still think Macs are a higher TCO).

Dan also demonstrated the Lively Kernel (formerly a research project by Sun Microsystems). Lively was inspired by Squeak, which is an open-source (MIT/Apache licensed) Smalltalk runtime. In fact, his entire presentation was run out of a Squeak VM image.

Why buzz has lost its fizz

References

I've been meaning to write up about Google Buzz for a while now, but what with one thing and another, I've not been able to get around to it. So, what do I think?

As with anything Google, the minute they announce something, it receives rave reviews (largely based on the media attention from a large-company release, rather than anything beneficial). There's then the usual gold rush to reserve well known profiles, and then, in essence, a bunch of people standing around asking “What now?”

Unfortunately for Google, Buzz was a real stain on their release process in more ways than they could have expected. One wonders whether there was an attempt to “Do an Apple” and keep a feature wrapped until global release; but either way, it launched with some serious, and subsequently acknowledged, flaws. In fact, these flaws were reminiscent of earlier mistakes with Google Reader and the “like” badge, as well as the auto-add to chat list for those who you mail. But whilst annoying in both places, these didn't automatically export your contact list to the world, which was the net effect of Google's Buzz at launch.

So, they've listened, and now your profile can be anonymous; further, it doesn't show your followers unless you explicitly ask it to. However, there's still a distinction between “people that you mail” and “people that you wish to follow”. As open-source projects are typically global and diverse, it's not necessarily the case that people whom I mail about bugs in their projects (or conversely, receive bugs about mine) will necessarily have blogs (or buzzes) that are interesting to me.

Secondly, people are interested in different things. An open-source project represents an intersection of those interests; whereas following assumes a union of interests. Those who read my blog or tweets will know that I'm interested in Eclipse; but also, Mac OS X, and to a lesser extent, other activities like flying, first aid, and a whole host of other minor references. It's unlikely that just because I happen to have mailed someone who's helping out with Eclipse on Objective C on the Mac is going to be interested in my thoughts on the new JSR 310 date format, or where OSGi is going; or for that matter, what the weather is at EGTC. So auto-following (in either direction) is likely to be equally wrong; and so too, is for Buzz. (There is a separate question which asks whether such monoculture is a good idea; but that's not relevant here.)

So, on to implementation. Arguably Twitter succeeded not just because of the novelty of the idea, but also because of the openness of the clients. The Twitter API is a very simple REST-based access, which has resulted in a plethora of different clients for both desktop and mobile systems. Google Buzz, on the other hand, uses a more complex (but, it should be noted, equally open) Atom data structure and the PubSubHubbub protocol, which isn't the kind of thing one throws together over a coffee break. Yes, there's implementations available for a handful of languages, but it's not wget. One might argue that Buzz is in fact a stripped down version of Google Wave – which I never got around to writing up, but again seems to be one of those 'meh' moments from Google.

However, the Buzz integration just sucks. It appears that Google, similar to pushing Wave as much as possible, are asking “Where can we shove Buzz next?”. As a result, it's turned up in Google Mail (hint; a mail client is not a chat client) and shortly, no doubt, in Google Reader in a more in-your-face-annoying-way than the 'stuff your friends saw'.

I've also hooked up my Twitter feed to Buzz, so if you're following me, you have a choice of whether to follow me on either Twitter or Buzz. I don't see me abandoning Twitter any time soon until Buzz achieves feature parity with Twitter (or more specifically, clients of the same). But what really gets annoying is when I post something in Twitter, the Buzz feed picks it up (OK) but then mails me the Buzz. What? I know what I said – in fact, I said it. No need to mail me to tell me what it is that I already knew I said, FFS. I've had to put in a Google Buzz rule in Google Mail to specifically delete all buzzes from me (is:buzz subject:Buzz from Alex Blewitt) just so they don't litter my inbox or All Mail boxes on my IMAP clients. I wouldn't mind so much if they were just telling me someone had commented on the Buzz, but there aren't any such comments and it's just what I already said.

Where Twitter still excels is the ability to find new people and subscribe to their feeds. You don't get this in Google Buzz yet. That's probably because no-one is using it (I only see feeds that are regularly updated from a small number of people; and all of those are from Twitter). So the @alblue messages don't get translated to a follow-able Google Profile, which means I still need to manage my interests at Twitter.

What's really needed is a quick and easy way of finding out what other people are interested in. Arguably, the reason that Twitter succeeded here was their “search” field, which rapidly evolved hashtags as a way of tagging posts by an open category set (rather than a closed category set, as found on InfoQ or DZone). In fact, arguably the success of the Google Code Issue tracker is its use of labels to mean absolutely anything; which means that you can attribute specific meanings to each item without being constrained about how the tool wants you to work. What we don't have (yet) is Labels for Buzz, nor for that matter a way of searching for Labels.

The main thing I miss about Twitter (or, for that mattter, an e-mail client) is marking which tiems I ahve read or not. It's not the case that I have time to read all my buzzes; but when I go into the Buzz tab of Google Mail, I get the option to expand and then mark everything as read (even though it might not have been). It also coalesces updates from the same person in a short period of time; which is generally no use, since they're typically unrelated to each other. People tend to check twitter/tweet in sporadic bursts, rather than hosting a you-me-you-me-you conversation, so Buzz's default everything-that-occurs-together-is-related model is a pretty poor fit for the real world.

In all, Buzz when first released was a privacy-busting Alpha release. Now, it's just a Beta release, and much like Google Wave, I don't think I'll be going back any time soon.

Update: Seems like Google is Listening. I think it's too late for Wave though.

Saturday, March 06, 2010

Merged ZFS from OpenSolaris to OSX

References

Wow, that took a long time. But I've finally managed to get the OpenSolaris codebase (shadowed at http://github.com/alblue/onnv-gate-zfs/tree/onnv_72, in case you're interested) merged in with the original Apple code changes of zfs119. The merged code is available at http://github.com/alblue/mac-zfs/commit/6d20fbb74f11a6765ca41d0144bd31609c15c5a9 if you want to see the changes in all their glory. Here's the announcement on the mailing list.

Caution: The merge is not ready for production use. There's a critical bug affecting unmounting of ZFS shares, which should prevent anyone wanting to use this in a real environment.

What does this mean for the ZFS project on Mac OS X? Well, for one thing, that it's still being worked on; even if progress isn't that great. Furthermore, whilst it builds on 10.5 and 10.6, there's a functionality that is still missing (for a list of open issues, see the Google Project Issues page), but having got to a stage where it's as merged as it can get with the OpenSolaris codebase, we should be able to roll forward with the changes over the coming weeks and months.

It's not all plain sailing, though; there are a significant number of changes in the Mac OS X codebase (for example, the way you free a node or determine if something is a directory, as well as ACL type permission checks) which will need to be re-applied to each incoming change from the upstream OpenSolaris codebase. And, there's a lot of Mac-specific code in there which isn't referenced (of course) and may need further changes.

It's also worth noting that this set of changes still needs to have the tyres kicked; whilst it's simple enough to build and install, it needs to undergo a lot more testing in order to determine whether it's safe for general use (and I'd prefer to hold off generating the installers until that time). If you're willing to give it a go, though, you can download the project, compile and build within Xcode, and then test it out for yourself.

Things I've learnt Merging is never easy at the best of times, but really, if you aren't working with a DVCS then you're shooting yourself in the feet before you even start. Even if you're forced to work in a CVCS by day, if you need to do a massive merge like this, consider importing and building a DVCS for the purposes of the merge, then exporting back again afterwards. You can even keep it around; I know of people who use Git on their desktops to get stuff in and out, and then synchronise with Git's svn module or export to the filing system for CVS's benefit.

Secondly, whether the repository is Git or Mercurial doesn't really matter, but it's a pain to work with both. The OpenSolaris codebase is hosted with Mercurial (at http://hub.opensolaris.org/bin/view/Project+onnv/) but you can't do merges directly between repositories. Instead, I cloned it locally, filtered it (using the hg.convert.filemap, at least, when it doesn't have errors in it – oops) into a second Hg repository, and then pushed it via http://hg-git.github.com/, which makes GitHub look like a Mercurial DVCS. Heck, if you prefer the Hg clients, you can still use GitHub. So, we now have http://github.com/alblue/onnv-gate-zfs/tree/onnv_72, which I plan to sync up with other versions in the future.

The key with a DVCS is the ease of merging. 708fa1 was the commit tree for the older ZFS-119 build from Apple, whilst fe4492 corresponds with the onnv-gate-zfs tree at onnv_72. And, with a hop, step and git merge, the two were brought together.

...except merging is never quite that easy. Yes, there's automatic symmetry if the files are in the same places/locations (which, thanks to a fair amount of earlier effort had already been done); but even so, the files were different and had evolved.

It seems that earlier irrelevant code was simply deleted from the Apple codebase, which meant the merges were much more difficult than it needed to be. Fortunately, as time has gone on, the coding style appears to have been to #ifdef sections of code applicable to Apple or not, with the result that the changes are much easier to process.

Tools I missed a good Git GUI for showing merge changes. I spent a lot of time with multiple terminal windows open and results of diff as I was going through. I guess we'll never see it in Xcode, but even other tools (like Eclipse EGit) aren't quite ready for showing diffs for merge conflict resolution at this stage. I also really wish I'd found diff --ifdef earlier; that would have saved me a lot of time in some of the simpler merge cases; though the right answer would probably be to accept the OpenSolaris implementation at times. However, all the automated tools in the world don't help when Apple's functions – particularly those in zfs_vnops.c and zfs_vfsops.c – have completely been renamed and their signatures changed.

In the end, I caught the last few (panic-inducing) errors using a little utility I created to do diffs. It's basically along the lines of:

cat ${INPUT} | 
  sed -e 's/\#include/@include/'
      -e 's/\#pragma.*//' |
  gcc -D_KERNEL -P -E - >
  ${OUTPUT}

What this does is runs through a file, changes the #includes to @includes (so that they don't get picked up by the pre-processor) and then through gcc's preprocessor to check the set of #ifdefs. That way, I can test the after effects of my #ifdefs (rather than just git diff) to see what I had done wrong. (The other bit strips out #pragmas, since they were causing unnecessary noise in the process.) With a quick find . -type f -exec diffone on my tree, combined with the outputs of the original codebase, I could see the exactly program diffs, rather than the source code diffs.

It turns out I had made 4 panic inducing errors in the merge, which is an symptom of doing some of these merges late into the night – or in some cases, early into the morning:

  • Refactoring char *tmp into user32_addr_t *tmp instead of user32_addr_t tmp (zfs_vfsops.c:1084)
  • Putting a *vpp = ZTOV(zp); just after, instead of just before, an #ifdef, with the result it was effectively removed (zfs_vnops.c:1716)
  • Missing off a return(error) somewhere
  • Missing a _ off one #ifdef __APPLE__ somewhere

The fact that I managed to get by with just these few merge errors is a combination of the power of a DVCS, as well as the hard work and dedication by those on the original ZFS Apple team (thanks again for everything, Noël et al). It was also made possible by the serious refactoring ahead of item to get the folders to match up – without which, the merge wouldn't have been able to happen.

Lastly, I'd just like to thank the others involved in the Mac ZFS project – both those that are helping with the code and the enthusiasm behind a filing system which clearly Apple has left behind. We've had a bit of a lull getting to this point, but it's all downhill from here.