Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Naming and indentation conventions

2005, java

I recently read a discussion on Naming interfaces in Java. The suggestion was to use the I prefix for interfaces, much like Eclipse does, and to evolve the interface with I*2 (like the LayoutManager2 interface in the Java libraries. It raised an interesting (and on-going) point that developers, from whatever their inclination, tend to think that suggestion X is dumb if they've never used it before.

Firstly, there's no such thing as a standard convention. A convention is something that all members of a team/group/community agree on, and there will always be different opinions regarding the minutæ if the language allows such variation. Whilst some languages (like Java) have a lot of flexibility with whitespace — leading to many different types of formatted code — others (like Python) all tend to look the same because whitespace is more rigid. It's arguable that this is one of the main benefits of python, because all python code looks like python code; whereas Java can look like C, C++ depending on the formatting conventions used.

On a side-note; it really bugs me that people learning a language from scratch don't have the decency to learn how code should be formatted, especailly when there are published guidelines as to how code should be formatted (also available in other formats).

The choice of names is more open, since that can't be regulated by a language syntax as easily. Admittedly, there are some common conventions that permeate the whole language (such as types beginning with an UpperCaseLetter and methods starting with a lowerCaseLetter). But what to call them is another matter. Historically, languages like C, C++ and CORBA (actually, IDL) have resulted in a number of coding conventions that are still in use today. For example, an accessor method in C++ is often written as value(), and since C++ can't have the same field name and method name, developers tended to name the field value_. You can still find Java code like this today, except that in Java it's not necessary, because you can have a method named the same as a field; though partially, Java coders know to use the getValue() convention instead.

Since there's no right or wrong convention, what should you choose? Well, beware of anyone who says "we should do X because it's more readable". There's no such thing as 'more' or 'less' readable when it comes to formatting conventions; it's just what you're used to. I could be a wizard with C code, for example, and have everything on one long line and word-wrap turned on. Just because that's more readable to me doesn't mean that it's more readable to you as well. Obviously, if you're working with a team of similar people, then it's more likely that your concept of what's more readable is more similar to others around you — but it's not a qualitative statement.

Like others before me, I've arrived at my own set of preferred coding conventions. Similarly, I've made choices that are dependent on the editor that I use to write code. (A lot of code written in C has the return type on a seperate line of code than the function name; that's so that each function starts in column 1, and you can use a regexp expression such as ^fu to search for 'the first occurence of fu that starts the line'. In the days before IDEs, this was a sensible coding convention; but IDEs have changed the field somewhat in this regard nowadays.)

So wtihout further ado, this is how I format and name my code. I've also tried to put the reasons why I chose them (note: I'm not justifying why I use them; I'm explaining how I chose them in the first place). I should also point out that whilst Java is my main current langauage, I've also written in C, Objective-C, Python, Prolog and (shudder) LISP. I try not to use the same set of conventions everywhere, but rather learn the language's default naming conventions first.

Spaces or tabs

I use tabs to format my source code. Everyone has their own preferred indentation level in code (personally, mine's about 2cm) and there's no right or wrong way of indenting code, just different preferences. However, almost every text editor allows you to set what indentation you want a tab to appear as, so ->  <- might be your favourite, and ->      <- might be someone else's. At least with a &u0009; character, it can be set on an editor-by-editor basis regardless of who wrote the code.

The arguments against tabs (they default to 8 spaces in editors such as vi don't really hold much sway; firstly, not much work is done in vi these days when IDEs such as Eclipse are around; and secondly, in more recent versions of vi (specifically, VIM) it's possible to set the tabstop to whatever value you want it to look like as well (with :set tabstop=3 in VIM). Note that I'm not knocking vi as an editor; I still use it for a lot of remote work on server machines; but I am suggesting that it's not used significantly for development.

Blank lines and single-line comments

I get rid of all blank lines in my source code, and don't write in-line comments (either those beginning // or /* */ on a single line). Code is supposed to be about code, not surrounding whitespace. Furthermore, the most compact representation is to have no blank lines or in-line comments, and thus is a directly measurable objective. On the other hand, there's no 'right' way of enforcing a blank line convention. You might start making arbitrary decisions such as 'an empty line should precede loop statements' but it's not easily measurable; and therefore not enforcable. However, it's trivial to hvae a post-processor that removes all single-line-comments and blank lines.

I ought to point out that I'm not against writing comments, or about writing obfuscated code. But there are many, many uses of single-line comments that are pointless; for example, above a for loop explaining that the next five lines searches through a loop to find the largest value. It should really be obvious that that's what it's doing; and if it's not, refactor it. As a direct observation; if you've got a 20 (or more) line method and you're starting to become confused as to what the method does, refactor it into individual methods. And those methods must have JavaDoc comments. If you really need to explain what a large method does, breaking it out into individually named and commented methods is the right way of explaining it.

Personally, I think the single-line-comment is only useful for commenting out blocks of code whilst testing, since it's not possible in Java to use nested /* */ comments. We need a language that knows what a context-sensitive grammar is; it's not like we don't have the CPU power to write one; we're still stuck in the LALR(1) days.

Bracketing

I put the brackets { } on the same line as the statements; and I always use them on looping/conditional constructs. Frankly, there are two extremes that you can have; either everything goes on the same line, or everything goes on its own line:

Most compactMost expanded
public class Example {
  public static void main(String args[]) {
    double goodMood = Math.random();
    if (goodMood > 0.5) {
      System.out.println("Hello World");
    } else if (good mood < 0.3) {
      System.out.println("Goodbye World");
    } else {
      System.out.prinln("Go back to sleep");
    } 
  }
}
public class Example
{
  public static void main(String args[])
  {
    double goodMood = Math.random();
    if (goodMood > 0.5)
    {
      System.out.println("Hello World");
    }
    else 
    if (goodMood < 0.3)
    {
      System.out.println("Goodbye World");
    }
    else
    {
      System.out.prinln("Go back to sleep");
    } 
  }
}

Whatever your preference in indentation, the most expanded clearly takes the most lines to represent, and the most condensed clearly is the most compact. It's a matter of personal preference which you find 'more readable', but as highlighted above, that's a preference, not an absolute requirement one way or the other. I personally find that the more compact the code is, the more I can see on one screen, which is no bad thing in the days of IDEs where screen estate is cluttered with (albeit useful) stuff like error messages, packages and so forth. Perhaps it's no wonder that the people who prefer the 'expanded' view of code tend to like editing windows that are tall and thin, with information down the left side of the screen, as opposed to those that prefer compact code which utilise the width of the screen. On a related note, it really surprises me how many people continue to use the 'Java perspective' in Eclipse, when there's a 'Java browsing perspective' instead. I'd like to postulate that Mac OS X owners (or those from NeXT days) will prefer the superior column-format for viewing code; whereas people who've only known Explorer default to using a tree view because it's the only thing they've known. It might work for 'hello world' type applications, but browsing large amount of code it's simply not scalable.

Eclipse screenshot of column viewEclipse screenshot of tree view
Screenshot of Eclipse Column (Java Browsing) ViewScreenshot of Eclipse Tree (Java Navigator) View

Digression aside, why did I choose the compact over the extended? Well, the reason boils down to one ability; the fact that with the compact view, you can put a System.out.println() on any line within a method and not affect the behaviour of the code. If you use the expanded form, and put the a debug line in the wrong place, there's a possibility that you'll break the code. As a corollary, you get the same behaviour with cut-n-paste, say, for re-ordering nested if/else statements — or for commenting them out. The italicised if block can be commented out completely with just two lines, and you can be sure (because there's one open and one close on the same line) that you're not going to accidentally comment out just a start or an end of a block. OK, so grammatically you're commenting out the end of the last block and the beginning of the start (I've seen people trying to cut'n'paste from just the 'else' and capture the closing '}' on different lines in the past) but being able to comment out/delete/re-order whole lines with no fear of bracketing error is the real reason why I chose the condensed form.

Naming conventions

I use prefixes over suffixes, because IDEs have name completion from a prefix, not a suffix. As a result, I don't use XxxImplementation or XxxImpl for names (why do people still insist on using the Impl abbreviation? Modern languages/compilers/systems don't have limitations on the length of names.) The main advantage is that if you have a name/character that identifies the type of object you're dealing with (like I for interfaces) you can then do I+control+space and get a list of all the interfaces. Additionally, in anything that sorts alphabetically (tree view, column view, whatever) all of the same things will be grouped together.

Here are the more common ones that I use:

_field

Putting a _ on the front of fields makes them easily distinguishable from local variable names. Not only does it help with _+control+space as a list of fields, it also allows me to avoid having to do this.field = field in setter methods; there's never a chance that an instance field and a local variable are confused. It also encourages me to use the getter instead of the direct field access. It's similar to the idea of putting an f on the front; but whilst the f might clash with a name like furl, the _url doesn't so much.

_$staticField

As with the field one; except that the extra presence of a $ indicates that it is a static field.

IInterfaceName

Identifies an interface. Used in a number of places in the Eclipse development, and seems like a good idea. Has the potential to clash with other types prefixed with I such as InternetAddress, which is a downside, but given a type does let you determine if the type is an interface or class based on its name alone.

AbstractTypeName

Indicates the base of an abstract hierarchy, often one that is designed to be subclassed to provide functionality. Often implements an interface as well. For example, might have IVehicle that has a bunch of generic implementation in AbstractVehicle as well as template methods that are implemented in concrete subclasses such as CarVehicle.

CONSTANT_NAME

Represents a static final constant, rather than _$ prefix. Mostly because it's been a convention for ages, and the _ prefix is private fields; constants are usually public.

Sorting members

Sorting members is the only defendable arrangement of fields/methods. Once you start making arbitrary decisions such as associating getters and setters, or associating methods in an interface, things get much more complex. They sound good when you only have one interface, or only have a single get/set pair; but what happens when a get method is in an interface but the set method isn't? What happens if you're implementing two interfaces and one method is in both? Where do you put methods that aren't in any interface? The only grouping that you can sensibly do is to group by type; so fields tend to be sorted before (or after) constructors and methods. You can also sort by sub-type as well (for example, static methods before instance methods). Keeping a class sorted results in less churn when new methods are added; because there's always a 'right place' to put a method. And hey, if you've got two related methods that should appear together, then if they're named with the same prefix you get it for free, as well as enforcing the fact that the methods are related.

Although Java doesn't officially enforce it, putting fields before methods seems to be the norm. There's a defined list in Eclipse that sorts fields/constructors/methods, as well as pulling the static equivalents to the front. Note there's a couple of bugs in Eclipse that gets it wrong; the sort order should be to put fields before initializers, and it currently doesn't do that. To allow for Java initialisation, the sort order should be:

  1. Static Fields
  2. Static Initialisers
  3. Instance Fields
  4. Instance Initialisers

for fields, and:

  1. Constructors
  2. Instance Methods

for the methods. Static methods are a bit of an oddity to place; some people put them with the static fields (in which case, they should be after the static field definitions), some after fields but before constructors. I'd say that if you're going to sort the static elements separately, it makes sense to group all static items together (i.e. before instance fields/initialisers/constructors); otherwise, put them with the normal methods.

Whatever coding convention you choose, I believe that it's important to be able to justify why you use it. "Because it's more readable" is not an argument; if you're used to something, you will find it more readable than something you're not. There's no reason why a German keyboard is any better/worse than an English keyboard just because they've got the Y and Z in different places; at best, even if you're an experienced touch typer, you only take a few minutes to get used to it (and a few more hours to get rid of not making the same mistake!). Similarly, left-hand and right-hand drive cars are equally drivable; the other one isn't "less usable" just because you're not used to it. I chose which coding conventions based on properties about the coding convention, not because I was used to a coding convention from a previous language. I hope you found the decision process interesting, even if you don't come to the same conclusions I did. If you like, you can download these as an Eclipse 3.x preference file.