Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Equality for all

2004 Java

I had a brief discussion with Elliotte Rusty Harold about the use of symbols for equality recently, which was based around Groovy's use of equality symbols. It turns out that there are some standard types of equality (in an object oriented language), and that the = symbol is perhaps overused to the extent of errors.

As anyone who has grown up in a C programming language will know, having = and == as separate symbols (for assignment and equality respectively) causes no end of bugs in code like:

if (a=b) {
  // oops; I meant to compare 'a' and 'b', not assign them...

Indeed, Java 'fixed' this problem by declaring that the expression in an if statement must only be of type boolean; thus, the compiler could detect these invalid uses of assignment and flag them as warnings.

It didn't solve the problem, though; if 'a' and 'b' are boolean variables, it will let the assignment go through; yet another glitch in the somewhat tired Java language.

Why do we have these errors, anyway? Isn't Java supposed to be the main language? Well, yes; but it was based on C, which in turn had all sorts of oddities picked up through its life. (No-one, outside of chmod commands, uses Octal any more; why it's still supported in the Java language is one of its C-based heritage.)

The big problem is that we have three uses of the = symbol and equality in general:

  1. Assignment
  2. Equality
  3. Identity

But each language uses different symbols to mean these things; let's see how languages stack up:

  1. Only for strings; C does not support objects directly
  2. Must be overriden for subclasses; defaults to identity comparison

We can see how easy it is to assume that the operator you've known all your life suddenly becomes different in a different language, which was part of Elliotte's objections to Groovy's use of ===.

In particular, it seems to me that people who have come through a C route (and on into Java) seem to insist that equality should be a method, not an operator; whereas people who have come from Python (or other script-esque languages like VB and JavaScript) much prefer the simplicity of an operator for comparison.

Further, there's a distinction between comparisons depending on whether you are using value or reference semantics. In C, structs use value semantics (so == compares the value) whereas a pointer to a struct uses reference semantics (so == compares references). This distinction, coupled with Java's use of both (primitives use value semantics; reference types use reference semantics) accounts for the difference in behaviour of == when compared with primitive and reference types.

This all brings me to my main point: a high-level language should make it easy for a programmer to write a program. If I want to compare things, I want to be able to do it easily. If I want to test for identity, I want to do that easily too. Defining a specific method (e.g. .equals()) is fine when you get to know a language, but it's unneccessary cruft that doesn't need to be there. And scripting languages have popularised the use of == for comparing objects, and = for assignment. The only 'new' one is that of comparing identity, and again, it should be easy to use a symbol (as opposed to a method) for this. === seems like a very good compromise; it's unlikely that anyone would type === for == since it has the extra character, and most of the time, equality is what you want for comparison; identity then only gets used in certain specialised circumstances, such as manipulating hashtables.

But a high-level language should also prevent mistakes; and the most obvious mistake is using an assignment instead of equality. It can easily be avoided if a programming language makes assignment a statement, and not an expression; if it were, then you couldn't use if (a=b); similarly, the statement a==4 would not compile as an expression.

Lastly, a good high-level language will provide a default implementation of equality rather than defaulting to identity. There's no reason a compiler can't churn out a method that compares the non-transient, non-static fields as it compiles a class; like Java's constructors, if a class doesn't have one, it should be generated automatically -- but if it does exist, no such auto generation occurs. That way, it's not necessary for people to have to learn how to write equals() properly.

I believe that Groovy is definitely on the right track of allowing programmers to use operators; it makes coding easier, and if it took the two fixes of assignment-as-statement, and auto-generation of .equals() methods, it would move further ahead of the field.