Alex headshot

AlBlue’s Blog

Macs, Modularity and More

StringBuffer and StringBuilder performance with JMH

2016, eclipse, java, jmh, optimisation, performance

Last week, Doug Schaefer wished on Twitter that other Eclipse projects were getting the same kind of contribution love as Platform UI. Lars Vogel attributed that to the effort in cleaning up the codebase and the focus on new contributions and contributors.

I thought I’d spend some time helping out CDT in assisting with this effort, and over the past week or so have been sending a few patches that way. Fortunately Sergey Prigogin has been an excellent reviewer, turning around my patches in a matter of hours in some cases, and that in turn has meant that I’ve been able to make further and faster progress than on some of the other projects I’ve tried contributing improvements to.

Most recently I’ve been looking into optimising some of the StringBuffer code and thought I’d go into a little bit of detail about the performance aspects of these changes.

The TL;DR of this post is:

  • StringBuilder is better than StringBuffer
  • StringBuilder.append(a).append(b) is better than StringBuilder.append(a+b)
  • StringBuilder.append(a).append(b) is better than StringBuilder.append(a); StringBuilder.append(b);
  • StringBuilder.append() and + are only equivalent provided that they are not nested and you don’t need to pre-sizing the builder
  • Pre-sizing the StringBuilder is like pre-sizing an ArrayList; if you know the approximate size you can reduce the garbage by specifying a capacity up-front

Most of this may be common knowledge but I hope that I can back this up with data using JMH.

Introduction to JMH

The Java Microbenchmark Harness or JMH is the tool to use for performance testing microbenchmarks. In the same way that JUnit is the de facto standard for testing, JMH is the de facto standard for performance measurement. There’s a great thread that goes into the details behind some of JMH’s evolution and the choices that were made; and the fact that since then it seems to have edged out other performance testing benchmark tools like Caliper seems to be a good indicator of its future existence.

JMH projects can be bootstrapped from mvn and then compiled/post annotated with the launcher to generate a benchmarks.jar file, which contains the code under test as well as a copy of the JMH code in an uber JAR. It also helpfully sets up a command line interface that you can use to test your code, and is the simplest way to generate a project.

You can create a stub JMH project using the steps on the JMH homepage:

Generating a JMH project with mvn
1
2
3
4
5
6
7
$ mvn archetype:generate \
  -DinteractiveMode=false \
  -DarchetypeGroupId=org.openjdk.jmh \
  -DarchetypeArtifactId=jmh-java-benchmark-archetype \
  -DgroupId=org.sample \
  -DartifactId=test \
  -Dversion=1.0

From the command line, the sample project can be run by executing:

Compiling and Running the JMH benchmark
1
2
$ mvn clean package
$ java -jar target/benchmarks.jar

There’s a lot of flags that can be passed on the command line; passing -h will show the full list of flags that can be passed.

Using JMH in Eclipse

If you’re trying to run JMH in Eclipse, you will need to ensure that annotation processing is enabled. That’s because JMH uses annotations not only to annotate the benchmarks, but uses a annotation processing tool to transform the benchmarked code into executable units. If you don’t have annotation processing enabled and try to run it, you’ll see a cryptic message like Unable to read /META-INF/BenchmarkList

If you’ve created a Maven project (and presumably, therefore, have m2e installed) the easiest way is to install JBoss' m2e-apt connector, which allows you to configure the project for JDT’s support for APT. This can be installed from Eclipse → Preferences → Discovery and choosing the m2e-apt connector. After restart this can be used to enable the JDT support automatically bu going to Window → Preferences → Maven → Annotation Processing and then choosing the “Automatically configure JDT APT” option.

If you’re not using Maven then you can add the jmh-generator-annprocess JAR (along with its dependencies) to the project’s Java Compiler → Annotation Processing → Factory Path, and ensure that the annotation processing is switched on.

Tests can then be run by creating a launch configuration to run the main class org.openjdk.jmh.Main or by using the JMH APIs.

StringBuilder vs StringBuffer benchmark

So having got the basis for benchmarking set up, it’s time to look at the performance of the StringBuilder vs the StringBuffer. It’s a good idea to see what the performance is like of the empty buffers before we start adding content to it:

StringBenchmark.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public class StringBenchmark {
  @Benchmark
  public String testEmptyBuffer() {
    StringBuffer buffer = new StringBuffer();
    return buffer.toString();
  }

  @Benchmark
  public String testEmptyBuilder() {
    StringBuilder builder = new StringBuilder();
    return builder.toString();
  }

  @Benchmark
  public String testEmptyLiteral() {
    return "";
  }
}

Two things are worth calling out: the first is that the resulting expression you’re using always has to be returned to the caller, otherwise the JIT will optimise the code away. The second is that it’s worth testing the empty case first of all so that it sets a baseline for measurement.

We can run it from the command line by doing:

1
2
3
4
5
6
7
8
$ mvn clean package
$ java -jar target/benchmarks.jar Empty \
   -wi 5 -tu ns -f 1 -bm avgt
...
Benchmark                      Mode  Cnt  Score   Error  Units
StringBenchmark.testEmptyBuffer   avgt   20  8.306 ± 0.497  ns/op
StringBenchmark.testEmptyBuilder  avgt   20  8.253 ± 0.416  ns/op
StringBenchmark.testEmptyLiteral  avgt   20  3.510 ± 0.139  ns/op

The flags used here are -wi (warmup iterations), -tu (time unit; nanoseconds), -f (number of forked JVMs) and -bm (benchmark mode; in this case, average time).

Somewhat unsurprisingly the values are relatively similar, with the return literal being the fastest.

What if we’re concatenating two strings? We can write a method to test that as well:

StringBenchmark.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@Benchmark
public String testHelloWorldBuilder() {
  StringBuilder builder = new StringBuilder();
  builder.append("Hello");
  builder.append("World");
  return builder.toString();
}

@Benchmark
public String testHelloWorldBuffer() {
  StringBuffer buffer = new StringBuffer();
  buffer.append("Hello");
  buffer.append("World");
return buffer.toString();
}

When run, it looks like:

1
2
3
4
5
6
7
$ mvn clean package
$ java -jar target/benchmarks.jar Hello \
   -wi 5 -tu ns -f 1 -bm avgt
...
Benchmark                           Mode  Cnt   Score   Error  Units
StringBenchmark.testHelloWorldBuffer   avgt   20  25.747 ± 1.188  ns/op
StringBenchmark.testHelloWorldBuilder  avgt   20  25.411 ± 1.015  ns/op

Not much difference there, although the Buffer is marginally slower than the Builder is. That shouldn’t be too surprising; they are both subclasses of AsbtractStringBuilder anyway, which has all the logic.

Job done?

Are we all done yet? Well, no, because there are other things at play.

Firstly, JMH is a benchmarking tool to find the highest possible value of performance under load. What happens in Java is that by default HotSpot uses a tiered compilation model; it starts off interpreted, then once a method has been executed a number of times it gets compiled. In fact, there are different levels of compilation that kick in after a different amount of calls. You can see these if you look at the various *Threshold* flags generated by -XX:+PrintFlagsFinal from an OpenJDK installation.

When a method has been called thousands of times, it will be compiled using the Tier 3 (client) or Tier 4 (server) compiler. This generally involves optimisations such as in-lining methods, dead code elimination and the like. This gives the best possible code performance for the application.

But what if the method is called infrequently, or puts memory pressure on the garbage collector instead? It won’t be JIT compiled and so will take longer. We can see the effect of running in interpreted mode by running the generated benchmark code with -jvmArgs -Xint to force the forked JVM used to run the benchmarks to only use the interpreter:

Running benchmarks in interpreted mode
1
2
3
4
5
6
7
8
9
10
$ mvn clean package
$ java -jar target/benchmarks.jar Empty Hello \
   -wi 5 -tu ns -f 1 -bm avgt -jvmArgs -Xint
...
Benchmark                           Mode  Cnt     Score    Error  Units
StringBenchmark.testEmptyBuffer        avgt   20  1102.609 ± 66.596  ns/op
StringBenchmark.testEmptyBuilder       avgt   20   769.682 ± 27.962  ns/op
StringBenchmark.testEmptyLiteral       avgt   20   184.061 ± 13.587  ns/op
StringBenchmark.testHelloWorldBuffer   avgt   20  2299.749 ± 70.087  ns/op
StringBenchmark.testHelloWorldBuilder  avgt   20  2381.348 ± 38.726  ns/op

A better option is to use the JMH specific annotation @CompilerControl(Mode.EXCLUDE) which prevents benchmarking methods from being JIT compiled, while allowing the other Java classes to be JIT compiled as usual. This is akin to having other classes call the StringBuffer (so that is sufficiently well exercised) while emulating code that isn’t called all that frequently. It can be added at the class level or at the method level.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ grep -B2 class StringBenchmark.java
@State(Scope.Benchmark)
@CompilerControl(Mode.EXCLUDE)
public class StringBenchmark {
$ mvn clean package
$ java -jar target/benchmarks.jar Empty Hello \
   -wi 5 -tu ns -f 1 -bm avgt
...
Benchmark                              Mode  Cnt    Score   Error  Units
StringBenchmark.testEmptyBuffer        avgt   20  144.745 ± 4.561  ns/op
StringBenchmark.testEmptyBuilder       avgt   20  122.477 ± 3.273  ns/op
StringBenchmark.testEmptyLiteral       avgt   20   91.139 ± 1.685  ns/op
StringBenchmark.testHelloWorldBuffer   avgt   20  236.223 ± 7.679  ns/op
StringBenchmark.testHelloWorldBuilder  avgt   20  222.462 ± 5.733  ns/op

Either way, calling the code before the JIT compilation has kicked in magnifies the difference between the different types of data structure by a factor of around 10%. So for methods that are called less than 1000 times – such as during start-up or when invoked from a user interface – the difference will exist.

Different calling patterns

What about different calling patterns? One example I came across was using an implicit String concatenation inside a StringBuilder or StringBuffer. This might be the case when generating a buffer to represent an e-mail, for example.

To test this, and to prevent Strings being concatenated by the javac compiler, we need to use non-final instance variables. However, to do that with the benchmark requires that the class be annotated with @State(Scope.Benchmark). (As with public static void main(String args[]) it’s best to just learn that this is necessary when you’re getting started, and then understand what it means later.)

StringBenchmark.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
@State(Scope.Benchmark)
public class StringBenchmark {
  private String from = "Alex";
  private String to = "Readers";
  private String subject = "Benchmarking with JMH";
  ...
  @Benchmark
  public String testEmailBuilderSimple() {
    StringBuilder builder = new StringBuilder();
    builder.append("From");
    builder.append(from);
    builder.append("To");
    builder.append(to);
    builder.append("Subject");
    builder.append(subject);
    return builder.toString();
  }

  @Benchmark
  public String testEmailBufferSimple() {
    StringBuffer buffer = new StringBuffer();
    buffer.append("From");
    buffer.append(from);
    buffer.append("To");
    buffer.append(to);
    buffer.append("Subject");
    buffer.append(subject);
    return buffer.toString();
  }
}

You can selectively run the benchmarks by putting one or more regular expressions on the command line:

1
2
3
4
5
6
7
$ mvn clean package
$ java -jar target/benchmarks.jar Simple \
   -wi 5 -tu ns -f 1 -bm avgt
...
Benchmark                               Mode  Cnt   Score   Error  Units
StringBenchmark.testEmailBufferSimple   avgt   20  88.149 ± 1.014  ns/op
StringBenchmark.testEmailBuilderSimple  avgt   20  88.277 ± 1.201  ns/op

These obviously take a lot longer to run. But what about other forms of the code? What if a developer has used + to concatenate the fields together in the append calls?

StringBenchmark.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public String testEmailBuilderConcat() {
  StringBuilder builder = new StringBuilder();
  builder.append("From" + from);
  builder.append("To" + to);
  builder.append("Subject" + subject);
  return builder.toString();
}

@Benchmark
public String testEmailBufferConcat() {
  StringBuffer buffer = new StringBuffer();
  buffer.append("From" + from);
  buffer.append("To" + to);
  buffer.append("Subject" + subject);
  return buffer.toString();
}

Running this again shows why this is a bad idea:

1
2
3
4
5
6
7
8
9
$ mvn clean package
$ java -jar target/benchmarks.jar Simple Concat \
   -wi 5 -tu ns -f 1 -bm avgt
...
Benchmark                               Mode  Cnt    Score   Error  Units
StringBenchmark.testEmailBufferConcat   avgt   20  105.424 ± 3.704  ns/op
StringBenchmark.testEmailBufferSimple   avgt   20   91.427 ± 2.971  ns/op
StringBenchmark.testEmailBuilderConcat  avgt   20  100.295 ± 1.985  ns/op
StringBenchmark.testEmailBuilderSimple  avgt   20   90.884 ± 1.663  ns/op

Even though these calls do the same thing, the cost of having an embedded implicit String concatenation is enough to add a 10% penalty on the time taken for the methods to return.

This shouldn’t be too surprising; the cost of doing the in-line concatenation means that it’s generating a new StringBuilder, appending the two String expressions, converting it to a new String with toString() and finally inserting that resulting String into the outer StringBuilder/StringBuffer.

This should probably be a warning in the future.

Chaining methods

Finally, what about chaining the methods instead of referring to a local variable? That can’t make any difference; after all, this is equivalent to the one before, right?

StringBenchmark.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@Benchmark
public String testEmailBuilderChain() {
  return new StringBuilder()
   .append("From")
   .append(from)
   .append("To")
   .append(to)
   .append("Subject")
   .append(subject)
   .toString();
}

@Benchmark
public String testEmailBufferChain() {
  return new StringBuffer()
   .append("From")
   .append(from)
   .append("To")
   .append(to)
   .append("Subject")
   .append(subject)
   .toString();
}

What’s interesting is that you do see a significant difference:

1
2
3
4
5
6
7
8
9
10
$ java -jar target/benchmarks.jar Simple Concat Chain \
   -wi 5 -tu ns -f 1 -bm avgt
...
Benchmark                               Mode  Cnt    Score   Error  Units
StringBenchmark.testEmailBufferChain    avgt   20   38.950 ± 1.120  ns/op
StringBenchmark.testEmailBufferConcat   avgt   20  103.151 ± 4.197  ns/op
StringBenchmark.testEmailBufferSimple   avgt   20   89.685 ± 2.041  ns/op
StringBenchmark.testEmailBuilderChain   avgt   20   38.113 ± 1.012  ns/op
StringBenchmark.testEmailBuilderConcat  avgt   20  102.193 ± 2.829  ns/op
StringBenchmark.testEmailBuilderSimple  avgt   20   89.117 ± 2.658  ns/op

In this case, the chaining together of arguments has resulted in a 50% speed up of the method call after JIT. One possible reason this may occur is that the length of the method’s bytecode has been significantly reduced:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ javap -c StringBenchmark.class | egrep "public|areturn"
  public java.lang.String testEmailBuilder();
      60: areturn
  public java.lang.String testEmailBuffer();
      60: areturn
  public java.lang.String testEmailBuilderConcat();
      84: areturn
  public java.lang.String testEmailBufferConcat();
      84: areturn
  public java.lang.String testEmailBuilderChain();
      46: areturn
  public java.lang.String testEmailBufferChain();
      46: areturn

Simply by chaining the .append() methods together has resulted in a smaller method, and thus a faster call site when compiled to native code. The other advantage (though not demonstrated here) is that the size of the bytecode affects the caller’s ability to in-line the method; smaller than 35 bytes (-XX:MaxInlineSize) means the method can be trivially inlined, and if it’s smaller than 325 bytes then it can be in-lined if it’s called enough times (-XX:FreqInlineSize).

Finally, what about ordinary String concatenation? Well, as long as you don’t mix and match it, then you’re fine – it works out as being identical to the testEmailBuilderChain methods.

StringBenchmark.java
1
2
3
4
@Benchmark
public String testEmailLiteralConcat() {
  return "From" + from + "To" + to + "Subject" + subject;
}

Running it shows:

1
2
3
4
5
$ java -jar target/benchmarks.jar EmailLiteral \
   -wi 5 -tu ns -f 1 -bm avgt
...
Benchmark                         Mode  Cnt   Score   Error  Units
StringBenchmark.testEmailLiteral  avgt   20  38.033 ± 0.588  ns/op

And for comparative purposes, running the lot with @CompilerControl(Mode.EXCLUDE) (simulating an infrequently used method) gives:

1
2
3
4
5
6
7
8
9
10
11
$ java -jar target/benchmarks.jar Email \
   -wi 5 -tu ns —f 1 -bm avgt
...
Benchmark                               Mode  Cnt    Score    Error  Units
StringBenchmark.testEmailBufferChain    avgt   20  416.745 ±  9.087  ns/op
StringBenchmark.testEmailBufferConcat   avgt   20  764.726 ±  9.535  ns/op
StringBenchmark.testEmailBufferSimple   avgt   20  462.361 ± 15.091  ns/op
StringBenchmark.testEmailBuilderChain   avgt   20  384.936 ±  9.173  ns/op
StringBenchmark.testEmailBuilderConcat  avgt   20  752.375 ± 19.544  ns/op
StringBenchmark.testEmailBuilderSimple  avgt   20  414.372 ±  6.940  ns/op
StringBenchmark.testEmailLiteral        avgt   20  417.772 ±  9.515  ns/op

What a lot of rubbish

The other aspect that affects the performance is how much garbage is created during the program’s execution. Allocating new data in Java is very, very fast these days, regardless of whether it’s interpreted or JIT compiled code. This is especially true of the new +XX:+UseG1GC which is available in Java 8 and will become the default in Java 9. (Hopefully it will also become a part of the standard Eclipse packages in the future.) That being said, there are certainly cycles that get wasted, both from the CPU but also the GC, when using concatenation.

The StringBuffer and StringBuilder are implemented like an ArrayList (except dealing with an array of characters instead of an array of Object instances). When you add new content, if there’s capacity, then the content is added at the end; if not, a new array is created with double-plus-two size, the content backing store is copied to a new array, and then the old array is thrown away. As a result this step can take between O(1) and O(n lg n) depending on whether the initial capacity is exceeded.

By default both classes start with a size of 16 elements (and thus the implicit String concatenation also uses that number); but the explicit constructors can be overridden to specify a default starting size.

JHM also comes with a garbage profiler that can provide (in my experience, fairly accurate) estimates of how much garbage is collected per operation. It does this by hooking into some of the serviceability APIs in the OpenJDK runtime (so other JVMs may find this doesn’t work) and then provides a normalised estimate for how much garbage is attributable per operation. Since garbage is a JVM wide construct, any other threads executing in the background will cause the numbers to be inaccurate.

By modifying the creation of the StringBuffer with a JMH parameter, it’s possible to provide different values at run-time for experimentation:

StringBenchmark.java
1
2
3
4
5
6
7
8
public class StringBenchmark {
  @Param({"16"})
  private int size;
  ...
  public void testEmail... {
    StringBuilder builder = new StringBuilder(size);
  }
}

It’s possible to specify multiple parameters; JMH will then iterate over each and give the results separately. Using @Param({"16","48"}) would run first with 16 and then 48 afterwards.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ java -jar target/benchmarks.jar EmailBu \
   -wi 5 -tu ns -f 1 -bm avgt -prof gc
...
Benchmark                                               (size)  Mode  Cnt     Score     Error   Units
StringBenchmark.testEmailBufferChain                        16  avgt   20    37.593 ±   0.595   ns/op
StringBenchmark.testEmailBufferChain:·gc.alloc.rate.norm    16  avgt   20   136.000 ±   0.001    B/op
StringBenchmark.testEmailBufferConcat                       16  avgt   20   155.290 ±   2.206   ns/op
StringBenchmark.testEmailBufferConcat:·gc.alloc.rate.norm   16  avgt   20   576.000 ±   0.001    B/op
StringBenchmark.testEmailBufferSimple                       16  avgt   20   136.341 ±   3.960   ns/op
StringBenchmark.testEmailBufferSimple:·gc.alloc.rate.norm   16  avgt   20   432.000 ±   0.001    B/op
StringBenchmark.testEmailBuilderChain                       16  avgt   20    37.630 ±   0.847   ns/op
StringBenchmark.testEmailBuilderChain:·gc.alloc.rate.norm   16  avgt   20   136.000 ±   0.001    B/op
StringBenchmark.testEmailBuilderConcat                      16  avgt   20   153.879 ±   2.699   ns/op
StringBenchmark.testEmailBuilderConcat:·gc.alloc.rate.norm  16  avgt   20   576.000 ±   0.001    B/op
StringBenchmark.testEmailBuilderSimple                      16  avgt   20   136.587 ±   3.146   ns/op
StringBenchmark.testEmailBuilderSimple:·gc.alloc.rate.norm  16  avgt   20   432.000 ±   0.001    B/op

Running this shows that the normalised allocation rate for the various methods (gc.alloc.rate.norm) varies between 136 bytes and 576 for both classes. This shouldn’t be a surprise; the implementation of the storage structure is the same between both classes. It’s more noteworthy to observe that there is a variation between using the chained implementation and the simple allocation (136 vs 432).

The 136 bytes is the smallest value we can expect to see; the resulting String in our test method works out at 45 characters, or 90 bytes. Considering a String instance has a 24 byte header and a character array has a 16 byte header, 90 + 24 + 16 = 130. However, the character array is aligned on an 8 bit boundary, so it is rounded up to 96 bits. In other words, the code for the *Chain methods has been JIT optimised to produce a single String with the exact data in place.

The *Simple methods have additional data generated by the increasing size of the internal character backing array. 136 of the bytes are the returned String value, so that can be taken out of the equation. Of the 296 remaining bytes, 24 bytes are taken up by the StringBuilder leaving 272 bytes to account for. This actually turns out to be the character arrays; a StringBuilder starts off with a size of 16 chars, then doubles to 34 chars and then 70 chars, following a 2n+2 growth. Since each char[] has an overhead of 16 bytes (12 for the header, 4 for the length) and that chars are stored as 16 bit entities, this results in 48, 88 and 160 bytes. Perhaps unsurprisingly the growth (and subsequent discarded char[] arrays) equal 296 bytes. So the growth of both the *Simple elements are equivalent here.

The larger values in the *Concat methods show additional garbage growth caused due to the temporary internal StringBuilder elements.

To test a different starting size of the buffer, passing the -p size=48 JMH argument will allow us to test the effect of initialising the buffers with 48 characters:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ java -jar target/benchmarks.jar EmailBu \
   -wi 5 -tu ns -f 1 -bm avgt -prof gc -p size=48
...
Benchmark                                               (size)  Mode  Cnt     Score     Error   Units
StringBenchmark.testEmailBufferChain                        48  avgt   20    38.961 ±   1.732   ns/op
StringBenchmark.testEmailBufferChain:·gc.alloc.rate.norm    48  avgt   20   136.000 ±   0.001    B/op
StringBenchmark.testEmailBufferConcat                       48  avgt   20   106.726 ±   4.118   ns/op
StringBenchmark.testEmailBufferConcat:·gc.alloc.rate.norm   48  avgt   20   392.000 ±   0.001    B/op
StringBenchmark.testEmailBufferSimple                       48  avgt   20    93.455 ±   2.702   ns/op
StringBenchmark.testEmailBufferSimple:·gc.alloc.rate.norm   48  avgt   20   248.000 ±   0.001    B/op
StringBenchmark.testEmailBuilderChain                       48  avgt   20    39.056 ±   1.723   ns/op
StringBenchmark.testEmailBuilderChain:·gc.alloc.rate.norm   48  avgt   20   136.000 ±   0.001    B/op
StringBenchmark.testEmailBuilderConcat                      48  avgt   20   103.264 ±   2.404   ns/op
StringBenchmark.testEmailBuilderConcat:·gc.alloc.rate.norm  48  avgt   20   392.000 ±   0.001    B/op
StringBenchmark.testEmailBuilderSimple                      48  avgt   20    88.175 ±   2.442   ns/op
StringBenchmark.testEmailBuilderSimple:·gc.alloc.rate.norm  48  avgt   20   248.000 ±   0.001    B/op

By tweaking the initialised StringBuffer/StringBuilder instances to 48 bytes, we can reduce the amount of garbage generated as part of the concatenation process. The Java implicit String concatenation is outside our control, and is a result of the underlying character array resizing itself.

Here, the *Simple methods have dropped from 432 to 248 bytes, which represents the 136 byte String result and a copy of the 112 byte array (corresponding to an 41-48 character array with the 16 byte header). Presumably in this case the JIT has managed to avoid the creation of the StringBuilder instance in the *Simple methods, but the array copy has leaked through. However other than these two values, there is no additional garbage created.

Conclusion

Running benchmarks is a good way of finding out what the cost of a particular operation is, and JMH makes it easy to be able to generate such benchmarks. Being able to ensure that the benchmarks are correct are a little harder, as well as what effect seen by other processes. Of course, different machines will give different results to these, and you’re encouraged to replicate this on your own setup.

Although the fully JIT compiled method for both StringBuffer and StringBuilder are very similar, there is an underlying trend for the StringBuilder to be at least as fast as its StringBuffer older cousin. In any case, implicit String concatenation (with +) creates a StringBuilder under the covers and it’s likely therefore that the StringBuilder will hit hot compilation method before StringBuffer in any case.

The most efficient way of concatenating strings is to have a single expression which uses either implicit String concatenation ( + + + + ) or has a series of (e.g. .append().append().append()) without any intermediate reference to a local variable. If you’ve got a lot of constants then using + will also have the advantage of using constant folding of the String literals ahead of time.

Mixing + and .append() is a bad idea though, because there will be extra pressure on the memory as the String instances are created and then immediately thrown away.

Finally, although using + + + + is easy, it doesn’t let you pre-size the StringBuilder array, which starts off with 16 characters by default. If the StringBuilder is used to create large Strings then avoiding multiple results is a relatively simple optimisation technique as far as reducing garbage is concerned. In addition, the array copy operation will grow larger as the size of the data set increases.

The sample code for this blog post is available at https://gist.github.com/alblue/aa4453a5b1614ee1084570f32b8b5b95 if you’d like to replicate the findings.

Swift 2.2 Released

2016, book, swift

Today, Apple released Swift 2.2 for OSX and Linux, which is the first release since the project was open-sourced last year. It contains a number of contributions from non-Apple employees as well as a number of source-incompatible changes.

To celebrate the release, until the end of March, my Swift Essentials book has a 50% off discount for the eBook and a 30% off discount for the printed book. More information is available on the Swift Essentials website.

The changes for Swift 2.2 include:

  • The removal of the C-style for loop (SE-0007)
  • The removal of increment ++ and decrement -- operators (SE-0004)
  • The replacement of typealias with associatedtype as a keyword (SE-0011)
  • Using #selector(foo) instead of Selector("foo") (which is now compile checked) to fit in better with # designated operators in Swift (SE-0022)
  • The addition of #if as a build-time and operating-system time test (SE-0020)

Any code that has:

For loops - the old way
1
2
3
for(var i=0; i<10; i++) {
  ...
}

needs to be refactored to:

For loops - the new way
1
2
3
for i in 0..10 {
  ...
}

Any references to:

Selectors - the old way
1
let s = Selector("doStuff")

needs to be refactored to:

Selectors - the new way
1
let s = #selector(doStuff)

Using the Swift version syntax allows for different paths to be taken by the compiler in the code:

Building platform-specific code
1
2
3
4
5
6
7
8
9
10
11
#if os(Linux)
import Glibc
#else
import Darwin
#endif

#if arch(i386)
let bits = 32
#elseif arch(x86_64)
let bits = 64
#endif

and can be used to selectively enable new features based on the release:

Building version-sensitive code
1
2
3
4
5
6
7
#if swift(>=3.0)
 print("Welcome to Swift 3")
#elseif swift (>=2.2)
 print("Welcome to Swift 2.2")
#else
 print("Please upgrade your version of Swift")
#endif

Fix-its built into Xcode 7.3 (which was released at the same time) will allow you to migrate code from one version to another.

Xcode 7.3 is avaliable from Apple for OSX, and Swift 2.2 is available from the Swift website for OSX and Ubuntu.

Finding duplicate objects with Eclipse MAT

2016, eclipse

I’ve written before about optimising memory in Eclipse, previously looking at the preponderance of new Boolean() (because you can never have too many true or false values).

Recently I wondered what the state of other values would be like. There are two interesting types; Strings and Integers. Strings obviously take up a lot of space (so there’s more effect there) but what about Integers? Well, back when Java first got started there was only new Integer() if you wanted to obtain a primitive wrapper. However, Java 1.5 added Integer.valueOf() that is defined (by JavaDoc) to cache values in the range -128…127. (This was because autoboxing was added in with generics, and autboxing uses Integer.valueOf() under the covers.)

There were other caches added for other types; the Byte type is fully cached, for example. Even Character instances are cached; in this case, the ASCII subset of characters. Long values are also cached (although it may be that the normal values stored in a Long fall outside of the cacheable range, particularly if they are timestamps).

I thought it would be instructive to show how to use Eclipse MAT to see how to identify what kind of problems are and how to fix them. This can have tangible benefits; last year (thanks to the kind reminding of Lars Vogel) I committed a fix that was a result of discovering the string www.eclipse.org over 6000 times in Eclipse’s memory.

Once you install Eclipse MAT, it doesn’t seem obvious how to use it. What it provides is an editor that can understand Java’s hprof memory dumps, and generate reporting on them. So the first thing to do is generate a heap dump from a Java process in order to analyze it.

For this example, I downloaded Eclipse SDK version 4.5 and then imported an existing “Hello World” plug-in project, which I then ran as an Eclipse application and closed. The main reason for this was to exercise some of the paths involved in running Eclipse and generate more than just a minimal heap.

There are many ways to generate a heap; here, I’m using jcmd to perform a GC.heap_dump to the local filesystem.

Using jcmd to perform a heap dump
1
2
3
4
5
6
$ jcmd -l
83845
83870 sun.tools.jcmd.JCmd -l
$ jcmd 83845 GC.heap_dump /tmp/45.hprof
83845:
Heap dump file created

Normally the main class will be shown by JCmd; but for JVMs that are launched with an embedded JRE it may be empty. You can see what there is by using the VM.command_line command:

Using VM.command_line
1
2
3
4
5
6
7
$ jcmd 83845 VM.command_line
83845:
VM Arguments:
jvm_args: -Dosgi.requiredJavaVersion=1.7 -XstartOnFirstThread -Dorg.eclipse.swt.internal.carbon.smallFonts -XX:MaxPermSize=256m -Xms256m -Xmx1024m -Xdock:icon=../Resources/Eclipse.icns -XstartOnFirstThread -Dorg.eclipse.swt.internal.carbon.smallFonts -Dosgi.requiredJavaVersion=1.7 -XstartOnFirstThread -Dorg.eclipse.swt.internal.carbon.smallFonts -XX:MaxPermSize=256m -Xms256m -Xmx1024m -Xdock:icon=../Resources/Eclipse.icns -XstartOnFirstThread -Dorg.eclipse.swt.internal.carbon.smallFonts
java_command: <unknown>
java_class_path (initial): /Applications/Eclipse_4-5.app/Contents/MacOS//../Eclipse/plugins/org.eclipse.equinox.launcher_1.3.100.v20150511-1540.jar
Launcher Type: generic

Opening the heap dump

Provided that Eclipse MAT is installed, and the heap dump ends in .hprof, the dump can be opened by going to “File → Open File…” and then selecting the heap dump. After a brief wizard (you can cancel this) you’ll be presented with a heap dump overview:

This shows a pie chart showing which classes contribute to most heap space; moving the mouse over each section shows the class name. In the above image, the 3.8Mb slice of the pie is selected and shows that it is due to the org.eclipse.core.internal.registry.ExtensionRegistry class.

To find out where duplicate objects exist, we can open the Group By Value report. This is under the blue icon to the right of the profile, under the Java Basics menu:

When this menu is selected, a dialog will be presented, which allows one (or more) classes to be selected. It’s also possible to enter more specific searches, such as an OQL query, but this isn’t necessary at first.

To find out what the duplicated strings are, enter java.lang.String as the class type:

This then shows a result of all of the objects which have a .toString() method that’s the same, including the number of objects and the shallow heap (the amount of memory taken up by the direct objects but not referenced data):

The result type is sorted from the number of objects and then shallow heap. In this case, there are 2,338 String instances that have the value true taking up 56k and 1,051 instances that have the value false. You can never be too sure about the truth. (I want to know what JEDgpPXhjM4QTCmiytQcTsw3bLOeXXziZSSx0CGKRPA= is and why we need 300 of them …)

The impact of duplicate strings can be mitigated with Java 8’s -XX:+UseStringDeduplication feature. This will keep all 2,338 instances of String but repoint all of the char elements to the same backing character array. Not a bad tune-up, and for platforms that require Java 8 as a minimum it may make sense to enable that flag by default. Of course tooling (such as Eclipse MAT) can’t tell when this is in use or not so you may still see the duplicate data referenced in reports.

What about Integer instances? Well, running new Integer() is guaranteed to create a new instance while Integer.valueOf() uses the integer cache. Let’s see how many integers we really have, by running the same Group By Value report with java.lang.Integer as the type:

Quite a few, though obviously not as memory-hungry as the Strings were; in fact, we have 11k’s worth of Integer instances on heap. This shows that small numbers, like 0, 1, 2, and 8 are seen a lot, as are MAX_VALUE and MIN_VALUE. Were we to fix it we’d likely get around 10k back on heap – not a huge amount, to be sure.

The number 100 seems suspicious; we’ve got a few of them kicking around. Plus unlike our other power-of-two numbers it seems to stick out. So how do we find where that comes from?

One feature of Eclipse MAT is to be able to step into a set of objects and then show their references; either incoming (objects that point to this) or outgoing (objects that they point to). Let’s see where the references have come from, by right-clicking on 100 and then “List Objects → Incoming References”. A new tab will be opened within the editor showing the list of Integral values, which can then be expanded to see where they come from:

This shows 5 instances, and their reference graph. The last one is the one created by the built-in Integer cache but the others all seem to come from the org.eclipse.e4.ui.css.core.dom.properties.Gradient class, via the GradientBackground class. We can open up the code to see a List of Integer objects, but no allocation:

Gradient.java
1
2
3
4
5
public class Gradient {
  private final List<Integer> percents = new ArrayList<>();
  public void addPercent(Integer percent) {
      percents.add(percent);
  }

Searching for references in the codebase for the addPercent() method call leads to the org.eclipse.e4.ui.css.swt.helpers.CSSSWTColorHelper class:

CSSSWTColorHelper.java
1
2
3
4
5
6
7
8
9
10
11
public class CSSSWTColorHelper {
  public static Integer getPercent(CSSPrimitiveValue value) {
      int percent = 0;
      switch (value.getPrimitiveType()) {
      case CSSPrimitiveValue.CSS_PERCENTAGE:
          percent = (int) value
          .getFloatValue(CSSPrimitiveValue.CSS_PERCENTAGE);
      }
      return new Integer(percent);
  }
}

And here we find both the cause of the duplicate integers and also the meaning. Presumably there are many references in the CSS files to 100% in the gradient, and each time we come across that we’re instantiating a new Integer instance whether we need to or not.

Ironically if the method had just been:

CSSSWTColorHelper.java
1
2
3
4
5
6
7
8
9
10
11
public class CSSSWTColorHelper {
        public static Integer getPercent(CSSPrimitiveValue value) {
                int percent = 0;
                switch (value.getPrimitiveType()) {
                case CSSPrimitiveValue.CSS_PERCENTAGE:
                        percent = (int) value
                        .getFloatValue(CSSPrimitiveValue.CSS_PERCENTAGE);
                }
                return percent;
        }
}

then autoboxing would have kicked in, which uses Integer.valueOf() under the covers, and it would have been fine. Really, using new Integer() is a code smell and should be a warning; and yes, there’s a bug for that.

And as is usual in many cases, Lars Vogel has already been and fixed bug 489234:

Bug 489234
1
2
 - return new Integer(percent);
 + return Integer.valueOf(percent);

Conclusion

Being able to fix replacements for integers isn’t specifically important in itself, but realising that new Integer() (and doubly so, new Boolean()) is an anti-pattern is the educational point here. Generally speaking, if you have new Integer(x).intValue() then replace it with Integer.parseInt(x) and otherwise replace it with Integer.valueOf(x) instead.

In fact, if you’re returning from a method that is declared to be of type Integer or assigning to a field of type Integer then you can just use the literal value, and it will be created to the right type with autoboxing (which uses Integer.valueOf() under the hood). However if you’re inserting values into a collection type then instantiating the right object is a better idea.

If you know your value is outside of the cached range then using new Integer() will have exactly the same effect as calling Integer.valueOf(). Under JIT optimisation for hot methods you’d expect them to have the same effect. However note that Integer.valueOf() could change over time (for example, to cache MAX_VALUE) which you won’t be able to take advantage of if you use the constructor. Plus, there’s also a run-time switch -Djava.lang.Integer.IntegerCache.high=1024 if you wanted to extend the cached values to more integers. This is only currently respected for Integer types though; other primitive wrappers don’t have the same configuration property.

In addition, being able to look for duplicate objects in memory and discover where the memory heap lays is an important tool in understanding where Eclipse’s memory is used and what can be done to try and resolve some of those issues. For example, digging into a stray object reference resulted in discovering that P2 has its own Integer cache despite having a minimum dependency of Java 1.5 which added the Integer.valueOf(). Hopefully we can remediate this.

Oh, and that JEDgpPXhjM4QTCmiytQcTsw3bLOeXXziZSSx0CGKRPA= string? It turns out that it’s a value in META-INF/MANIFEST.MF for the SHA-256-Digest of a bunch of (presumably empty) resource files in the com.ibm.icu bundle:

com.ibm.icu.jar!META-INF/MANIFEST.MF
1
2
3
4
5
Name: com/ibm/icu/impl/data/icudt54b/cy_GB.res
SHA-256-Digest: JEDgpPXhjM4QTCmiytQcTsw3bLOeXXziZSSx0CGKRPA=

Name: com/ibm/icu/impl/data/icudt54b/ksb_TZ.res
SHA-256-Digest: JEDgpPXhjM4QTCmiytQcTsw3bLOeXXziZSSx0CGKRPA=

In fact, you might not be surprised to know that there are 300 of them :)

1
2
3
4
5
6
7
8
$ grep SHA-256-Digest MANIFEST.MF | sort | uniq -c | sort -nr
 300 SHA-256-Digest: JEDgpPXhjM4QTCmiytQcTsw3bLOeXXziZSSx0CGKRPA=
  65 SHA-256-Digest: Ku5LOaQNbYRE7OFCreIc9LWXXQBUHrrl1IhxJy4QRkA=
  61 SHA-256-Digest: TFNUA5jTkKhhjE/8DQXKUtrvohd99m5Q3LrEIz5Bj4I=
  53 SHA-256-Digest: p7PURP2WmyEtwG26wCbOYyN+8v3SjhinC5uUomd5uJA=
  53 SHA-256-Digest: fTZLTXXbc5Z45DJFKvOwo6f5yATqT8GsD709psc90lo=
  49 SHA-256-Digest: SiArmu+IqlRtLpSQb6d2F5/rIu6CU3lnBgyY5j2r7s0=
  49 SHA-256-Digest: A5xl6s5MaIPeiyNblw/SCEWgA0wRdjzo7e7tXf3Sscs=

It turns out that while investigating one optimisation you find another potential for optimisation. The manifest parser stores the manifest for the bundles, which has both the main section (where the interesting parts of the manifest live) as well as all of the other sections (including their signatures). I’m not sure that it’s really needed; it was introduced in 865896 and the only place it’s used is to attempt to capture a per-directory Specification-Title.

Since this isn’t largely used by OSGi, if modifying the runtime to not store the hash data, we can save ½Mb or so of redundant strings, though the other savings can bring into a couple of megabytes or so:

Whether this is an optimisation that can be applied, the discussion is at bug 490008.

Update: thanks to Thomas Watson, the bug fix was discussed and integrated into 4.5M7, which should bring a couple of megabytes lower memory utilisation to Eclipse runtimes in future.