Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Eclipse memory optimisation

2015 Eclipse

Apologies for the dearth of posts recently – 2015 will go down as one of my lowest blog posting years, from a high of a hundred+ per year a few years ago. Partly that’s because I’ve been writing books and also a change of job as well as hosting the Docklands LJC. But never mind that now, onto Eclipse …

I’ve been looking at performance of Eclipse over the last few months, specifically regarding the start-up time and also a micro-memory optimisation as well. I’ve been promising myself to blog about this for a while, but not had time until now.

The first I’ll talk about is the use of new Boolean in the codebase. This turned up by accident when I was looking at memory usage and whether String de-duplication would be beneficial to Eclipse (there are a lot of Strings in a runtime Eclipse instance). Side note: Eclipse Memory Analyser Tool (MAT) is excellent; and it’s part of the Eclipse Mars release, so you should install it right away. Go on, I’ll wait.

String de-duplication can be turned on with -XX:+UseStringDeduplication in an eclipse.ini file, or with an option -vmargs -XX:+UseStringDeduplication on the command line. This works by comparing Strings to a prior list of values when performing garbage collection, and if the data of two Strings are the same, then the backing array of one is replaced with the backing array of the other. You still have two independent String instances (so a == b is false) but the underlying char array is the same (so a.value == b.value).

It turned out that there were a heck of a lot of references to www.eclipse.org (several tens of thousands, if I recall). Now Eclipse doesn’t need to be that vain, and it turned out that all of these references were created with new URI calls that were indirectly being driven by references in P2 files like content.xml and artifacts.xml (or their compressed counterparts). It turns out that if you cache these in a Map based on the hostname, then you can create an efficient way of acquiring these objects to prevent excessive memory usage/recycling. This change was merged in for Eclipse Mars.

Anyway, whilst in the MAT view I ran the ‘Boolean instances’ check, and it showed that there were around ten instances of Boolean in the heap, and this was from a relatively empty Eclipse instance. Now, a boolean value only has two values, so finding ten instances was a little confusing. It turns out that most of these are from code that looks like new Boolean(value) where value is either a String (e.g. "true" or "false") or a plain wrapped boolean value. The former is used widely for representing options and preferences in Eclipse (e.g. use tabs or use spaces) and so the code used new Boolean() to do the parsing. In some cases, the booleanValue() was being used to then convert the object wrapper into its boolean counterpart, for use in an if statement or a local boolean value.

The main use, then, of new Boolean was to perform parsing on the string value, so that it could be used in a test; or in some cases, stored as a value in another collections class. (There are a few places where Boolean is being used as a tri-state; true, false and null). When Java originally was created, it didn’t have a separate parse method; and when Eclipse was written, Java 1.2 didn’t have any other way of doing parsing of truth values other than using the constructor.

Fortunately Java 1.5 added Boolean.parseBoolean() which does the same parsing as the constructor, and returned a boolean value from a String. (In fact the constructor now delegates to that static method to do its work.) However, by that time large quantities of Eclipse code had been written using the constructor and with no warnings raised by Eclipse itself these went undetected for a long time. Java 1.5 also added Boolean.valueOf() which acted in exactly the same way as the constructor, taking either a String or boolean value, and then returning one of the canonical Boolean.TRUE or Boolean.FALSE instances. In fact, several of the changes turned up things like Boolean.valueOf(true) which could trivially be replaced with Boolean.TRUE and Booolean.valueOf(something).booleanValue() which could be replaced with Boolean.parseBoolean(something) that has exactly the same effect, but without object creation.

It’s primarily because of Eclipse’s age and size that these changes existed. Eclipse 3.0 was released the same year – 2004 – that Java 1.5 came out, and had almost two million lines of code already in place by the time that happened; it wouldn’t be until Eclipse 3.3 was released in 2007 that support for Java 1.4 was dropped and Java 1.5 was a minimum, so that was the earliest time such a change could have taken place, by which time there were 17 million lines of code.

In any case, thanks to a number of successful code reviews, many of the places where new Boolean is called have now been weeded out:

Many of these were found by running Eclipse and doing a search for references to the new Boolean constructor (Cmd+Shift+G) when importing projects, but once the set of repos were known I did a git grep "new Boolean(" to find the locations, followed by a sed -i~ "s/new Boolean(/Boolean.valueOf(" to do the rewrites. Places where true and false were seen in the diffs were then replaced with Boolean.TRUE and Boolean.FALSE and combinations of Boolean.valueOf().booleanValue() were replaced with Boolean.parseBoolean() by inspection.

It turns out that Sonar is good at spotting these things as well, with its Boolean Instantiation rule; a list of all projects (that are covered by Sonar) that have new Boolean() calls can be found by running a search and putting Boolean Instantiation as a More Criteria field; apparently there are some 244 references that are still present in the Eclipse codebase – though this won’t contain any in-flight reviews or some of the recent changes. It looks like I need to submit a patch for CDT next …

Thanks to Mickael Istria for pointing out the Sonar results to me (see his blog post for more details), and of course everyone who has been reviewing patch-bombs from me, and hopefully using this we’ll be able to banish new Boolean from Eclipse completely.

PS Micro-optimisations should not be done 99% of the time but code cleanups may be worth it for their own sake.