Alex headshot

AlBlue’s Blog

Macs, Modularity and More

QCon London 2020 Day 2

2020 Qcon Conference

The second day of QConLondon had an opening keynote by (@anjuan)[https://twitter.com/anjuan] talking about the Underground Railroad network that helped free slaves in the United States. He likened various players in the scheme with the structure of management and developers. The talk was delivered in an entertaining enough fashion, but the subject was based on an analogy to something that is a part of American history rather than the UK or European history, and so was a little bit disconnected from the current reality – especially given the dual challenges to the UK of Brexshit and the Coronavirus. Ultimately I think this keynote may have worked better for an American audience.

The first real talk of the day talked about Quarkus, as a Java framework for small quick-start applications. One of the differences with a JIT enabled language is that there’s a warm-up phase when the application is reaching peak performance; not an issue for long-running applications, but can be a concern if you’re following continuous delivery, redelivering multiple times a day. If you’re deploying every 10 minutes, with every commit, then you may find that the Java application doesn’t ever reach steady state before being shut down for a new version of the service. Quarkus has supersonic/fast boot. It is designed for GraalVM by default, and takes a generated build file and uses a Maven/Gradle plugin to produce the optimised JAR, and using ahead-of-time compiler, an ELF executable for the GraalVM. It also produces containers with a small on-disk footprint. The hot reload of code makes for a fast turnaround time, and Quarkus is opinionated about how it starts Java – like creating a debug listener on the port at the same time.

Sergey Kuksenko @kuksenk0 has been a JVM engineer since 2005, and working on performance for the last decade. He kicked off a demo with two Java Mandelbrot generators, using a Complex class - but in the case of the faster demo (12fps vs 5fps), the only difference was the addition of the “inline” keyword. Valhalla provides a denser memory layout for inline class (aka value types), with the plan to have specialised generics. The ‘inline class’ rather than ‘value’ was chosen to make it easier to update the Java Language spec with fewer changes. Inline classes don’t have identity, which means they can’t be compared with == and so avoid problems like Integer.valueOf(42) == Integer.valueOf(42).

An inline class has no identity, is immutable, not nullable and not synchronizable. The JVM will decide whether to allocate the inline classes on the heap, or whether they’ll be on the stack or inlined into the container class. The phrase “Code like a class, work like an int” summarises the goal of Valhalla. The performance is about 50% faster, but importantly, has many fewer L1 cache misses. The work on benchmarking the current implementation is in progress, and seems to be under 2% regression at the moment. For any kind of arithmetic types, having Complex inline classes gave an order of magnitude speed up in some cases. The Optional class will be a value class in the future, and will be a proof of concept of a migration path. Work is ongoing to reduce the performance overheads so that it can be available in the future; for the time being, the JDK14 release in the next couple of weeks will have a version available for experimentation.

A wildcard session followed, due to the last-minute cancellation of a speaker. Instead, I went to a talk on TornadoVM, which provides a way of compiling and running Java code in parallel on a variety of different FPGA or GPU solutions. By translating Java bytecode into Tornado bytecode, and then having different translators which re-write those kernels to GPU specific instruction sets, it’s possible to get a many thousand times speed up on numerical calculations. A demo showed capturing depth information from video captured from a Microsoft Kinect, and re-rendering into three dimensional representation afterwards. Importantly they have a docker image which can be used for testing in beehivelab/tornado-gpu:latest which is covered by the project’s README.

The next session I attended was by Alina Yurenko from Oracle on “Maximising application performance with GraalVM”, which talked about using Graal as an ahead-of-time compiler for generating native images. Not only does starting the application run much faster, it also uses a lot less memory than the equivalent running application under the JVM. Partially this is due to the fact that the C2 compiler doesn’t need to compile the underlying JVM classes, and partially due to the fact that the actual runtime of the application has a far lower total memory layout. Of course, creating an accurate execution profile requires running the application under an expected (simulated) load, so that the correct hot code samples can be identified and thus translated appropriately. Graal uses information gained from the initial execution to prepare appropriate code for expected types; if these assumptions are incorrect, then the executed binary will be different. There was also a sales pitch of using GraalVM to host multiple languages, along with an evolution of the Nashorn JavaScript engine. It’s unclear as to whether people will really want to use a JVM for running multiple languages, but then again, those people never really saw JavaScript as anything other than a toy language, so what do they know? :)

The next session I attended was my own. I thought it would be a little rude not to turn up :) Fortunately the talk went OK – after all, I completed writing it with at least an hour to go – and as for timing, I finished 10s early. My talk was on CPU microarchitecture for maximum performance, looking at the nitty gritty details of how CPUs execute. I didn’t go down to the electron or transistor level, but rather talking about the general architectural details of the processor. My slides have been uploaded to my SpeakerDeck profile and apart from a quibble about a bullet on page 25, it seems that most people seem to have enjoyed it; after all, I did. The video was recorded and will be available on InfoQ at some point in the future; I’ll update this post with the link when I have a public link.

Day 2 ended with a get-together of the speakers in the usual location, and I made several contacts with people who had been speaking at QCon; some whom I knew, some whom I did not. One of the pleasures of QCon is meeting and talking to people; I enjoy meeting up with the attendees during the breaks, but it is also excellent to be able to talk to the movers and shakers of the conference.

Tomorrow I’m leading the Java track, which I’m looking forward to; stay tuned!