Alex headshot

AlBlue’s Blog

Macs, Modularity and More

StringBuffer and StringBuilder performance with JMH

2016 Java Eclipse Jmh Performance Optimisation

Last week, Doug Schaefer wished on Twitter that other Eclipse projects were getting the same kind of contribution love as Platform UI. Lars Vogel attributed that to the effort in cleaning up the codebase and the focus on new contributions and contributors.

I thought I’d spend some time helping out CDT in assisting with this effort, and over the past week or so have been sending a few patches that way. Fortunately Sergey Prigogin has been an excellent reviewer, turning around my patches in a matter of hours in some cases, and that in turn has meant that I’ve been able to make further and faster progress than on some of the other projects I’ve tried contributing improvements to.

Most recently I’ve been looking into optimising some of the StringBuffer code and thought I’d go into a little bit of detail about the performance aspects of these changes.

The TL;DR of this post is:

  • StringBuilder is better than StringBuffer
  • StringBuilder.append(a).append(b) is better than StringBuilder.append(a+b)
  • StringBuilder.append(a).append(b) is better than StringBuilder.append(a); StringBuilder.append(b);
  • StringBuilder.append() and + are only equivalent provided that they are not nested and you don’t need to pre-sizing the builder
  • Pre-sizing the StringBuilder is like pre-sizing an ArrayList; if you know the approximate size you can reduce the garbage by specifying a capacity up-front

Most of this may be common knowledge but I hope that I can back this up with data using JMH.

Introduction to JMH

The Java Microbenchmark Harness or JMH is the tool to use for performance testing microbenchmarks. In the same way that JUnit is the de facto standard for testing, JMH is the de facto standard for performance measurement. There’s a great thread that goes into the details behind some of JMH’s evolution and the choices that were made; and the fact that since then it seems to have edged out other performance testing benchmark tools like Caliper seems to be a good indicator of its future existence.

JMH projects can be bootstrapped from mvn and then compiled/post annotated with the launcher to generate a benchmarks.jar file, which contains the code under test as well as a copy of the JMH code in an uber JAR. It also helpfully sets up a command line interface that you can use to test your code, and is the simplest way to generate a project.

You can create a stub JMH project using the steps on the JMH homepage:

Generating a JMH project with mvn
$ mvn archetype:generate \
-DinteractiveMode=false \
-DarchetypeGroupId=org.openjdk.jmh \
-DarchetypeArtifactId=jmh-java-benchmark-archetype \
-DgroupId=org.sample \
-DartifactId=test \
-Dversion=1.0

From the command line, the sample project can be run by executing:

Compiling and Running the JMH benchmark
$ mvn clean package
$ java -jar target/benchmarks.jar

There’s a lot of flags that can be passed on the command line; passing -h will show the full list of flags that can be passed.

Using JMH in Eclipse

If you’re trying to run JMH in Eclipse, you will need to ensure that annotation processing is enabled. That’s because JMH uses annotations not only to annotate the benchmarks, but uses a annotation processing tool to transform the benchmarked code into executable units. If you don’t have annotation processing enabled and try to run it, you’ll see a cryptic message like Unable to read /META-INF/BenchmarkList

If you’ve created a Maven project (and presumably, therefore, have m2e installed) the easiest way is to install JBoss’ m2e-apt connector, which allows you to configure the project for JDT’s support for APT. This can be installed from Eclipse → Preferences → Discovery and choosing the m2e-apt connector. After restart this can be used to enable the JDT support automatically by going to Window → Preferences → Maven → Annotation Processing and then choosing the “Automatically configure JDT APT” option.

If you’re not using Maven then you can add the jmh-generator-annprocess JAR (along with its dependencies) to the project’s Java Compiler → Annotation Processing → Factory Path, and ensure that the annotation processing is switched on.

Tests can then be run by creating a launch configuration to run the main class org.openjdk.jmh.Main or by using the JMH APIs.

StringBuilder vs StringBuffer benchmark

So having got the basis for benchmarking set up, it’s time to look at the performance of the StringBuilder vs the StringBuffer. It’s a good idea to see what the performance is like of the empty buffers before we start adding content to it:

StringBenchmark.java
public class StringBenchmark {
@Benchmark
public String testEmptyBuffer() {
StringBuffer buffer = new StringBuffer();
return buffer.toString();
}
@Benchmark
public String testEmptyBuilder() {
StringBuilder builder = new StringBuilder();
return builder.toString();
}
@Benchmark
public String testEmptyLiteral() {
return "";
}
}

Two things are worth calling out: the first is that the resulting expression you’re using always has to be returned to the caller, otherwise the JIT will optimise the code away. The second is that it’s worth testing the empty case first of all so that it sets a baseline for measurement.

We can run it from the command line by doing:

$ mvn clean package
$ java -jar target/benchmarks.jar Empty \
-wi 5 -tu ns -f 1 -bm avgt
...
Benchmark Mode Cnt Score Error Units
StringBenchmark.testEmptyBuffer avgt 20 8.306 +- 0.497 ns/op
StringBenchmark.testEmptyBuilder avgt 20 8.253 +- 0.416 ns/op
StringBenchmark.testEmptyLiteral avgt 20 3.510 +- 0.139 ns/op

The flags used here are -wi (warmup iterations), -tu (time unit; nanoseconds), -f (number of forked JVMs) and -bm (benchmark mode; in this case, average time).

Somewhat unsurprisingly the values are relatively similar, with the return literal being the fastest.

What if we’re concatenating two strings? We can write a method to test that as well:

StringBenchmark.java
@Benchmark
public String testHelloWorldBuilder() {
StringBuilder builder = new StringBuilder();
builder.append("Hello");
builder.append("World");
return builder.toString();
}
@Benchmark
public String testHelloWorldBuffer() {
StringBuffer buffer = new StringBuffer();
buffer.append("Hello");
buffer.append("World");
return buffer.toString();
}

When run, it looks like:

$ mvn clean package
$ java -jar target/benchmarks.jar Hello \
-wi 5 -tu ns -f 1 -bm avgt
...
Benchmark Mode Cnt Score Error Units
StringBenchmark.testHelloWorldBuffer avgt 20 25.747 +- 1.188 ns/op
StringBenchmark.testHelloWorldBuilder avgt 20 25.411 +- 1.015 ns/op

Not much difference there, although the Buffer is marginally slower than the Builder is. That shouldn’t be too surprising; they are both subclasses of AsbtractStringBuilder anyway, which has all the logic.

Job done?

Are we all done yet? Well, no, because there are other things at play.

Firstly, JMH is a benchmarking tool to find the highest possible value of performance under load. What happens in Java is that by default HotSpot uses a tiered compilation model; it starts off interpreted, then once a method has been executed a number of times it gets compiled. In fact, there are different levels of compilation that kick in after a different amount of calls. You can see these if you look at the various *Threshold* flags generated by -XX:+PrintFlagsFinal from an OpenJDK installation.

When a method has been called thousands of times, it will be compiled using the Tier 3 (client) or Tier 4 (server) compiler. This generally involves optimisations such as in-lining methods, dead code elimination and the like. This gives the best possible code performance for the application.

But what if the method is called infrequently, or puts memory pressure on the garbage collector instead? It won’t be JIT compiled and so will take longer. We can see the effect of running in interpreted mode by running the generated benchmark code with -jvmArgs -Xint to force the forked JVM used to run the benchmarks to only use the interpreter:

Running benchmarks in interpreted mode
$ mvn clean package
$ java -jar target/benchmarks.jar Empty Hello \
-wi 5 -tu ns -f 1 -bm avgt -jvmArgs -Xint
...
Benchmark Mode Cnt Score Error Units
StringBenchmark.testEmptyBuffer avgt 20 1102.609 +- 66.596 ns/op
StringBenchmark.testEmptyBuilder avgt 20 769.682 +- 27.962 ns/op
StringBenchmark.testEmptyLiteral avgt 20 184.061 +- 13.587 ns/op
StringBenchmark.testHelloWorldBuffer avgt 20 2299.749 +- 70.087 ns/op
StringBenchmark.testHelloWorldBuilder avgt 20 2381.348 +- 38.726 ns/op

A better option is to use the JMH specific annotation @CompilerControl(Mode.EXCLUDE) which prevents benchmarking methods from being JIT compiled, while allowing the other Java classes to be JIT compiled as usual. This is akin to having other classes call the StringBuffer (so that is sufficiently well exercised) while emulating code that isn’t called all that frequently. It can be added at the class level or at the method level.

$ grep -B2 class StringBenchmark.java
@State(Scope.Benchmark)
@CompilerControl(Mode.EXCLUDE)
public class StringBenchmark {
$ mvn clean package
$ java -jar target/benchmarks.jar Empty Hello \
-wi 5 -tu ns -f 1 -bm avgt
...
Benchmark Mode Cnt Score Error Units
StringBenchmark.testEmptyBuffer avgt 20 144.745 +- 4.561 ns/op
StringBenchmark.testEmptyBuilder avgt 20 122.477 +- 3.273 ns/op
StringBenchmark.testEmptyLiteral avgt 20 91.139 +- 1.685 ns/op
StringBenchmark.testHelloWorldBuffer avgt 20 236.223 +- 7.679 ns/op
StringBenchmark.testHelloWorldBuilder avgt 20 222.462 +- 5.733 ns/op

Either way, calling the code before the JIT compilation has kicked in magnifies the difference between the different types of data structure by a factor of around 10%. So for methods that are called less than 1000 times – such as during start-up or when invoked from a user interface – the difference will exist.

Different calling patterns

What about different calling patterns? One example I came across was using an implicit String concatenation inside a StringBuilder or StringBuffer. This might be the case when generating a buffer to represent an e-mail, for example.

To test this, and to prevent Strings being concatenated by the javac compiler, we need to use non-final instance variables. However, to do that with the benchmark requires that the class be annotated with @State(Scope.Benchmark). (As with public static void main(String args[]) it’s best to just learn that this is necessary when you’re getting started, and then understand what it means later.)

StringBenchmark.java
@State(Scope.Benchmark)
public class StringBenchmark {
private String from = "Alex";
private String to = "Readers";
private String subject = "Benchmarking with JMH";
...
@Benchmark
public String testEmailBuilderSimple() {
StringBuilder builder = new StringBuilder();
builder.append("From");
builder.append(from);
builder.append("To");
builder.append(to);
builder.append("Subject");
builder.append(subject);
return builder.toString();
}
@Benchmark
public String testEmailBufferSimple() {
StringBuffer buffer = new StringBuffer();
buffer.append("From");
buffer.append(from);
buffer.append("To");
buffer.append(to);
buffer.append("Subject");
buffer.append(subject);
return buffer.toString();
}
}

You can selectively run the benchmarks by putting one or more regular expressions on the command line:

$ mvn clean package
$ java -jar target/benchmarks.jar Simple \
-wi 5 -tu ns -f 1 -bm avgt
...
Benchmark Mode Cnt Score Error Units
StringBenchmark.testEmailBufferSimple avgt 20 88.149 +- 1.014 ns/op
StringBenchmark.testEmailBuilderSimple avgt 20 88.277 +- 1.201 ns/op

These obviously take a lot longer to run. But what about other forms of the code? What if a developer has used + to concatenate the fields together in the append calls?

StringBenchmark.java
public String testEmailBuilderConcat() {
StringBuilder builder = new StringBuilder();
builder.append("From" + from);
builder.append("To" + to);
builder.append("Subject" + subject);
return builder.toString();
}
@Benchmark
public String testEmailBufferConcat() {
StringBuffer buffer = new StringBuffer();
buffer.append("From" + from);
buffer.append("To" + to);
buffer.append("Subject" + subject);
return buffer.toString();
}

Running this again shows why this is a bad idea:

$ mvn clean package
$ java -jar target/benchmarks.jar Simple Concat \
-wi 5 -tu ns -f 1 -bm avgt
...
Benchmark Mode Cnt Score Error Units
StringBenchmark.testEmailBufferConcat avgt 20 105.424 +- 3.704 ns/op
StringBenchmark.testEmailBufferSimple avgt 20 91.427 +- 2.971 ns/op
StringBenchmark.testEmailBuilderConcat avgt 20 100.295 +- 1.985 ns/op
StringBenchmark.testEmailBuilderSimple avgt 20 90.884 +- 1.663 ns/op

Even though these calls do the same thing, the cost of having an embedded implicit String concatenation is enough to add a 10% penalty on the time taken for the methods to return.

This shouldn’t be too surprising; the cost of doing the in-line concatenation means that it’s generating a new StringBuilder, appending the two String expressions, converting it to a new String with toString() and finally inserting that resulting String into the outer StringBuilder/StringBuffer.

This should probably be a warning in the future.

Chaining methods

Finally, what about chaining the methods instead of referring to a local variable? That can’t make any difference; after all, this is equivalent to the one before, right?

StringBenchmark.java
@Benchmark
public String testEmailBuilderChain() {
return new StringBuilder()
.append("From")
.append(from)
.append("To")
.append(to)
.append("Subject")
.append(subject)
.toString();
}
@Benchmark
public String testEmailBufferChain() {
return new StringBuffer()
.append("From")
.append(from)
.append("To")
.append(to)
.append("Subject")
.append(subject)
.toString();
}

What’s interesting is that you do see a significant difference:

$ java -jar target/benchmarks.jar Simple Concat Chain \
-wi 5 -tu ns -f 1 -bm avgt
...
Benchmark Mode Cnt Score Error Units
StringBenchmark.testEmailBufferChain avgt 20 38.950 +- 1.120 ns/op
StringBenchmark.testEmailBufferConcat avgt 20 103.151 +- 4.197 ns/op
StringBenchmark.testEmailBufferSimple avgt 20 89.685 +- 2.041 ns/op
StringBenchmark.testEmailBuilderChain avgt 20 38.113 +- 1.012 ns/op
StringBenchmark.testEmailBuilderConcat avgt 20 102.193 +- 2.829 ns/op
StringBenchmark.testEmailBuilderSimple avgt 20 89.117 +- 2.658 ns/op

In this case, the chaining together of arguments has resulted in a 50% speed up of the method call after JIT. One possible reason this may occur is that the length of the method’s bytecode has been significantly reduced:

$ javap -c StringBenchmark.class | egrep "public|areturn"
public java.lang.String testEmailBuilder();
60: areturn
public java.lang.String testEmailBuffer();
60: areturn
public java.lang.String testEmailBuilderConcat();
84: areturn
public java.lang.String testEmailBufferConcat();
84: areturn
public java.lang.String testEmailBuilderChain();
46: areturn
public java.lang.String testEmailBufferChain();
46: areturn

Simply by chaining the .append() methods together has resulted in a smaller method, and thus a faster call site when compiled to native code. The other advantage (though not demonstrated here) is that the size of the bytecode affects the caller’s ability to in-line the method; smaller than 35 bytes (-XX:MaxInlineSize) means the method can be trivially inlined, and if it’s smaller than 325 bytes then it can be in-lined if it’s called enough times (-XX:FreqInlineSize).

Finally, what about ordinary String concatenation? Well, as long as you don’t mix and match it, then you’re fine – it works out as being identical to the testEmailBuilderChain methods.

StringBenchmark.java
@Benchmark
public String testEmailLiteralConcat() {
return "From" + from + "To" + to + "Subject" + subject;
}

Running it shows:

$ java -jar target/benchmarks.jar EmailLiteral \
-wi 5 -tu ns -f 1 -bm avgt
...
Benchmark Mode Cnt Score Error Units
StringBenchmark.testEmailLiteral avgt 20 38.033 +- 0.588 ns/op

And for comparative purposes, running the lot with @CompilerControl(Mode.EXCLUDE) (simulating an infrequently used method) gives:

$ java -jar target/benchmarks.jar Email \
-wi 5 -tu ns -f 1 -bm avgt
...
Benchmark Mode Cnt Score Error Units
StringBenchmark.testEmailBufferChain avgt 20 416.745 +- 9.087 ns/op
StringBenchmark.testEmailBufferConcat avgt 20 764.726 +- 9.535 ns/op
StringBenchmark.testEmailBufferSimple avgt 20 462.361 +- 15.091 ns/op
StringBenchmark.testEmailBuilderChain avgt 20 384.936 +- 9.173 ns/op
StringBenchmark.testEmailBuilderConcat avgt 20 752.375 +- 19.544 ns/op
StringBenchmark.testEmailBuilderSimple avgt 20 414.372 +- 6.940 ns/op
StringBenchmark.testEmailLiteral avgt 20 417.772 +- 9.515 ns/op

What a lot of rubbish

The other aspect that affects the performance is how much garbage is created during the program’s execution. Allocating new data in Java is very, very fast these days, regardless of whether it’s interpreted or JIT compiled code. This is especially true of the new +XX:+UseG1GC which is available in Java 8 and will become the default in Java 9. (Hopefully it will also become a part of the standard Eclipse packages in the future.) That being said, there are certainly cycles that get wasted, both from the CPU but also the GC, when using concatenation.

The StringBuffer and StringBuilder are implemented like an ArrayList (except dealing with an array of characters instead of an array of Object instances). When you add new content, if there’s capacity, then the content is added at the end; if not, a new array is created with double-plus-two size, the content backing store is copied to a new array, and then the old array is thrown away. As a result this step can take between O(1) and O(n lg n) depending on whether the initial capacity is exceeded.

By default both classes start with a size of 16 elements (and thus the implicit String concatenation also uses that number); but the explicit constructors can be overridden to specify a default starting size.

JHM also comes with a garbage profiler that can provide (in my experience, fairly accurate) estimates of how much garbage is collected per operation. It does this by hooking into some of the serviceability APIs in the OpenJDK runtime (so other JVMs may find this doesn’t work) and then provides a normalised estimate for how much garbage is attributable per operation. Since garbage is a JVM wide construct, any other threads executing in the background will cause the numbers to be inaccurate.

By modifying the creation of the StringBuffer with a JMH parameter, it’s possible to provide different values at run-time for experimentation:

StringBenchmark.java
public class StringBenchmark {
@Param({"16"})
private int size;
...
public void testEmail... {
StringBuilder builder = new StringBuilder(size);
}
}

It’s possible to specify multiple parameters; JMH will then iterate over each and give the results separately. Using @Param({"16","48"}) would run first with 16 and then 48 afterwards.

$ java -jar target/benchmarks.jar EmailBu \
-wi 5 -tu ns -f 1 -bm avgt -prof gc
...
Benchmark (size) Mode Cnt Score Error Units
StringBenchmark.testEmailBufferChain 16 avgt 20 37.593 +- 0.595 ns/op
StringBenchmark.testEmailBufferChain: gc.alloc.rate.norm 16 avgt 20 136.000 +- 0.001 B/op
StringBenchmark.testEmailBufferConcat 16 avgt 20 155.290 +- 2.206 ns/op
StringBenchmark.testEmailBufferConcat: gc.alloc.rate.norm 16 avgt 20 576.000 +- 0.001 B/op
StringBenchmark.testEmailBufferSimple 16 avgt 20 136.341 +- 3.960 ns/op
StringBenchmark.testEmailBufferSimple: gc.alloc.rate.norm 16 avgt 20 432.000 +- 0.001 B/op
StringBenchmark.testEmailBuilderChain 16 avgt 20 37.630 +- 0.847 ns/op
StringBenchmark.testEmailBuilderChain: gc.alloc.rate.norm 16 avgt 20 136.000 +- 0.001 B/op
StringBenchmark.testEmailBuilderConcat 16 avgt 20 153.879 +- 2.699 ns/op
StringBenchmark.testEmailBuilderConcat: gc.alloc.rate.norm 16 avgt 20 576.000 +- 0.001 B/op
StringBenchmark.testEmailBuilderSimple 16 avgt 20 136.587 +- 3.146 ns/op
StringBenchmark.testEmailBuilderSimple: gc.alloc.rate.norm 16 avgt 20 432.000 +- 0.001 B/op

Running this shows that the normalised allocation rate for the various methods (gc.alloc.rate.norm) varies between 136 bytes and 576 for both classes. This shouldn’t be a surprise; the implementation of the storage structure is the same between both classes. It’s more noteworthy to observe that there is a variation between using the chained implementation and the simple allocation (136 vs 432).

The 136 bytes is the smallest value we can expect to see; the resulting String in our test method works out at 45 characters, or 90 bytes. Considering a String instance has a 24 byte header and a character array has a 16 byte header, 90 + 24 + 16 = 130. However, the character array is aligned on an 8 bit boundary, so it is rounded up to 96 bits. In other words, the code for the *Chain methods has been JIT optimised to produce a single String with the exact data in place.

The *Simple methods have additional data generated by the increasing size of the internal character backing array. 136 of the bytes are the returned String value, so that can be taken out of the equation. Of the 296 remaining bytes, 24 bytes are taken up by the StringBuilder leaving 272 bytes to account for. This actually turns out to be the character arrays; a StringBuilder starts off with a size of 16 chars, then doubles to 34 chars and then 70 chars, following a 2n+2 growth. Since each char[] has an overhead of 16 bytes (12 for the header, 4 for the length) and that chars are stored as 16 bit entities, this results in 48, 88 and 160 bytes. Perhaps unsurprisingly the growth (and subsequent discarded char[] arrays) equal 296 bytes. So the growth of both the *Simple elements are equivalent here.

The larger values in the *Concat methods show additional garbage growth caused due to the temporary internal StringBuilder elements.

To test a different starting size of the buffer, passing the -p size=48 JMH argument will allow us to test the effect of initialising the buffers with 48 characters:

$ java -jar target/benchmarks.jar EmailBu \
-wi 5 -tu ns -f 1 -bm avgt -prof gc -p size=48
...
Benchmark (size) Mode Cnt Score Error Units
StringBenchmark.testEmailBufferChain 48 avgt 20 38.961 +- 1.732 ns/op
StringBenchmark.testEmailBufferChain: gc.alloc.rate.norm 48 avgt 20 136.000 +- 0.001 B/op
StringBenchmark.testEmailBufferConcat 48 avgt 20 106.726 +- 4.118 ns/op
StringBenchmark.testEmailBufferConcat: gc.alloc.rate.norm 48 avgt 20 392.000 +- 0.001 B/op
StringBenchmark.testEmailBufferSimple 48 avgt 20 93.455 +- 2.702 ns/op
StringBenchmark.testEmailBufferSimple: gc.alloc.rate.norm 48 avgt 20 248.000 +- 0.001 B/op
StringBenchmark.testEmailBuilderChain 48 avgt 20 39.056 +- 1.723 ns/op
StringBenchmark.testEmailBuilderChain: gc.alloc.rate.norm 48 avgt 20 136.000 +- 0.001 B/op
StringBenchmark.testEmailBuilderConcat 48 avgt 20 103.264 +- 2.404 ns/op
StringBenchmark.testEmailBuilderConcat: gc.alloc.rate.norm 48 avgt 20 392.000 +- 0.001 B/op
StringBenchmark.testEmailBuilderSimple 48 avgt 20 88.175 +- 2.442 ns/op
StringBenchmark.testEmailBuilderSimple: gc.alloc.rate.norm 48 avgt 20 248.000 +- 0.001 B/op

By tweaking the initialised StringBuffer/StringBuilder instances to 48 bytes, we can reduce the amount of garbage generated as part of the concatenation process. The Java implicit String concatenation is outside our control, and is a result of the underlying character array resizing itself.

Here, the *Simple methods have dropped from 432 to 248 bytes, which represents the 136 byte String result and a copy of the 112 byte array (corresponding to an 41-48 character array with the 16 byte header). Presumably in this case the JIT has managed to avoid the creation of the StringBuilder instance in the *Simple methods, but the array copy has leaked through. However other than these two values, there is no additional garbage created.

Conclusion

Running benchmarks is a good way of finding out what the cost of a particular operation is, and JMH makes it easy to be able to generate such benchmarks. Being able to ensure that the benchmarks are correct are a little harder, as well as what effect seen by other processes. Of course, different machines will give different results to these, and you’re encouraged to replicate this on your own setup.

Although the fully JIT compiled method for both StringBuffer and StringBuilder are very similar, there is an underlying trend for the StringBuilder to be at least as fast as its StringBuffer older cousin. In any case, implicit String concatenation (with +) creates a StringBuilder under the covers and it’s likely therefore that the StringBuilder will hit hot compilation method before StringBuffer in any case.

The most efficient way of concatenating strings is to have a single expression which uses either implicit String concatenation ( + + + + ) or has a series of (e.g. .append().append().append()) without any intermediate reference to a local variable. If you’ve got a lot of constants then using + will also have the advantage of using constant folding of the String literals ahead of time.

Mixing + and .append() is a bad idea though, because there will be extra pressure on the memory as the String instances are created and then immediately thrown away.

Finally, although using + + + + is easy, it doesn’t let you pre-size the StringBuilder array, which starts off with 16 characters by default. If the StringBuilder is used to create large Strings then avoiding multiple results is a relatively simple optimisation technique as far as reducing garbage is concerned. In addition, the array copy operation will grow larger as the size of the data set increases.

Update 2020

I have uploaded this code to https://github.com/alblue/com.bandlem.jmh.microopts/ along with an updated version of the results, also committed to the repository.

One of the significant changes in the results was that the JVM has now learnt how to do indification of string concatenation, which has improved both the speed and also the garbage collection profile of the operations. However, the overall relative behaviour of the differences still holds.