Object Comparison: the equals and hashCode methods

I wrote this back in March, 2012 for my Object-Oriented Programming class at Texas Wesleyan University in Fort Worth. I got this out of the "vault" because my son, who is in High School, has a friend who had a question on this very topic. I thought I had posted this to my blog and was shocked when I learned that was not the case. So, here we go...

The equals method

In my opinion, this is probably the most overlooked topic taught in Java. I have worked (and still work) with developers who seem to struggle with the concept of object comparison in Java. In Data Structures, you may have learned this concept. I will attempt to illustrate and simplify this concept using a "real life" example. For my illustration, I will use the famous Russian Dolls.

A "Russian Doll," as it is typically known, is a set of wooden dolls of decreasing size placed one inside the other. Basically, all dolls with the exception of the smallest one are hollowed so that a larger doll could fit the next smaller doll inside of it. The smallest one of the set need not to be hollow since it does not need to store another doll inside. To illustrate the difference of shallow and deep comparisons, we will use two Russian Dolls.

Suppose the dolls are all stored inside one another so that only the biggest doll of the set is visible. It is simple to assume based on superficial observation that dolls are identical. However, if you were purchasing one of these, wouldn't you be curious to "crack open" the dolls to make sure that it is indeed a matching set or that the set is complete? How can you do this with just simple superficial observation? It simply cannot be done. Therefore, you need to open that outer shell and analyze what is inside.

Understanding the process of comparing objects

Have you ever heard the expression "comparing apples to oranges?" This basically implies that to make a valid comparison, you need to start by making sure you have "similar" things. When we apply this to comparing objects, first of all, we need to make sure that the objects at the very least are of the same data type.

You could say that comparing objects is involves multiple tiers of comparison:

  1. Check for referential equality
  2. Check data type equality
  3. Check the value of all relevant data members (data fields or attributes)
Referential equality means that the address (in memory) of one object is the same as the other. If they are, there is no need to make further (deeper) comparisons. This means they are equal (the same object). However, different memory locations could contain the same data. So, if the first tier checks fails, we need to dig deeper. The second level of comparison is checking the data type of the objects (are comparing apples to apples). If the data types ARE NOT the same, that means we are comparing apples to oranges. At this point, we can stop because, without looking inside the "Russian Dolls" we already know the objects are different. However, if the objects are of the same data type, then we must examine the data within and check all pertinent fields to see if the internal value of ALL fields is the same. For each data member that is another object, the same process already described applies.

The equals method in Java was designed for the purpose of comparing two objects. From the java.lang.Object API we learned that the equals method indicates whether the object being passed as a reference is "equal" to the one invoking the method. Notice the use of quotes around the word equal. This is because the equals method as implemented in the Object class alone, is not good enough to make that determination. This is true because the Object.equals(Object o) method only looks superficially at the objects being compared. The following is a direct copy from the Java API:

The equals method implements an equivalence relation of non-null object references:

  • It is reflexive: for any non-null reference value x, x.equals(x) should return true.
  • It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
  • It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
  • It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
  • For any non-null reference value x, x.equals(null) should return false.
The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any non-null reference values x and y, this method returns true if and only if x and y refer to the same object (x == y has the value true). Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

It is the last paragraph of the method description that seems to be always ignored. It clearly states that the hashCode method must be overridden whenever the equals method is overridden. The problem is that developers assume that the equals method is good enough as implemented in the Object class. Referring back to the Russian Dolls example, it is also very obvious that the only way to know for sure whether two distinct sets of Russian Dolls are identical to each other is by opening the dolls and examining their contents. How does this relate to code? Very simply put, as a general rule, when an object contains PRIMITIVE DATA TYPES ONLY, a simple shallow compare should be enough. However, objects containing references to other object (aggregation and/or composition) need also to have their unique implementation (method overriding) of the equals and hashCode methods.

The code example

Let's look at a simple code example. Suppose you want to draw two identical bullseyes on a canvas. Since a bullseye is representation of concentric circles with alternate colors, we will create two classes: A Bullseye class to represent the finished collection of concentric circles and a Circle class which is the data container for all the relevant information pertaining to each independent circle. For this example, this is the (x, y) coordinates representing its center point, its radius, and its color. The following is the code representing these two real-life objects:

Circle.java:


import java.awt.Color;
import java.awt.Point;

public class Circle {
 private Point center;
 private Color color;
 private double radius;
 
 public Circle(double radius, Point center) {
  this.radius = radius;
  this.center = center;
 }
 
 public Circle(double radius, Point center, Color color) {
  this(radius, center);
  this.color = color;
 }

 public Point getCenter() { return center; }

 public void setCenter(Point center) { this.center = center; }

 public double getRadius() { return radius; }

 public void setRadius(double radius) { this.radius = radius; }

 public Color getColor() { return color; }

 public void setColor(Color color) { this.color = color; }

 public int hashCode() {
  final int prime = 31;
  int result = 1;
  result = prime * result + ((center == null) ? 0 : center.hashCode());
  result = prime * result + ((color == null) ? 0 : color.hashCode());
  long bits = Double.doubleToLongBits(radius);
  result = prime * result + (int) (bits ^ (bits >>> 32));
  return result;
 }

 public boolean equals(Object obj) {
  if (this == obj) { return true; }
  
//  if (obj == null || !(obj instanceof Circle)) { return false; }

  if (obj == null || (obj.getClass() != this.getClass())) { return false; }

  Circle other = (Circle) obj;
  if (center == null) {
   if (other.center != null) { return false; }
  } else if (!center.equals(other.center)) { return false; }
  
  if (color == null) {
   if (other.color != null) { return false; }
  } else if (!color.equals(other.color)) { return false; }
  
  if (Double.doubleToLongBits(radius) != Double
    .doubleToLongBits(other.radius)) { return false; }
  
  return true;
 }
}

BullsEye.java:


import java.util.Arrays;

public class BullsEye {
 private Circle[] circles;
 
 public BullsEye(Circle[] circles) {
  this.circles = circles;
 }
 
 public Circle[] getCircles() { return circles; }

 public void setCircles(Circle[] circles) { this.circles = circles; }

 public int hashCode() {
  final int prime = 31;
  int result = 1;
  result = prime * result + Arrays.hashCode(circles);
  return result;
 }

 public boolean equals(Object obj) {
  if (this == obj) { return true; }

//  if (obj == null || !(obj instanceof BullsEye)) { return false; }
  
  if (obj == null || (obj.getClass() != this.getClass())) { return false; }

  BullsEye other = (BullsEye) obj;
  if (!Arrays.equals(circles, other.circles)) { return false; }

  return true;
 }
}
In order to effectively test whether or not the two bulls-eyes are identical or not, the Circle class must override the equals method as well. Failure to do so will result in an inaccurate comparison result.

Using obj.getClass() rather than instanceof Operator

Notice on the code example above the check using the instanceof operator is commented out. The reason is quite simple. Suppose two developers use the Circle class to implement their own custom implementation of a circle using the Circle class as a parent class. Since inheritance enforces an “is-a” relationship, a MyCircle class and YourCircle class, both inheriting from Circle, are also an instanceof Circle. Although, in some cases this might be OK, it is also very possible that MyCircle and YourCircle might not be totally symmetric with identical values and still be perceived as identical by the equals method. The safest way to determine if two distinct objects are indeed an instance of the SAME CLASS and not just a class on the class hierarchy, is by using the getClass() method. To emphasize this point even more, all Java classes are instanceof java.lang.Object.

Testing the code

The code below is the test class for the example above. There are three test cases (there could be more) which are executed to test this concept. Test Case 1 tests two distinct BullsEye instances in which the concentric circles are of the same size, have the same center point coordinates, but the color of the concentric circles are different. Test Case 2 tests two distinct BullsEye instances in which the concentric circles are of the same size and color, but have different center point coordinates. Test Case 3 tests two distinct BullsEye instances in which the concentric circles of each one is created using the same criteria. Notice that in all cases, the instances of Circle to create the array of circles is different. Another thing to notice is that both the BullsEye and Circle classes must override the equals method. If the equals method of the Circle class is commented out, the comparison should result in false. This is because the default implementation of the equals method is basically the same as object1 == object2. Since the object are different instances (objects reside in different memory locations), the result of the comparison will show false when indeed the two objects are the same.

BullsEyeTest.java:


import java.awt.Color;
import java.awt.Point;

public class BullsEyeTest
{

 public static void main(String[] args)
 {
  int i = 1;
  Point center1 = new Point(100, 100);
  Point center2 = new Point(50, 50);
  
  Circle[] circles1 = { new Circle(2.0, center1, Color.RED),
    new Circle(4.0, center1, Color.WHITE),
    new Circle(6.0, center1, Color.RED) };

  Circle[] circles2 = { new Circle(2.0, center1, Color.WHITE),
    new Circle(4.0, center1, Color.RED),
    new Circle(6.0, center1, Color.WHITE) };

  //Test Case 1: Bullseyes should be different
  BullsEye b1 = new BullsEye(circles1);
  BullsEye b2 = new BullsEye(circles2);
  
  System.out.println("TEST CASE " + i++);
  System.out.println("b1.equals(b2)? " + b1.equals(b2));//false
  System.out.println();
  //End Test Case 1

  //Test Case 2: Bullseyes should be different
  Circle[] circles3 = { new Circle(2.0, center1, Color.RED),
   new Circle(4.0, center2, Color.WHITE),
   new Circle(6.0, center1, Color.RED) };
  b2 = new BullsEye(circles3);
  
  System.out.println("TEST CASE " + i++);
  System.out.println("b1.equals(b2)? " + b1.equals(b2));//false
  System.out.println();
  //End Test Case 2

  //Test Case 3: Bullseyes should be the same
  Circle[] circles4 = { new Circle(2.0, center1, Color.RED),
   new Circle(4.0, center1, Color.WHITE),
   new Circle(6.0, center1, Color.RED) };
  b2 = new BullsEye(circles4);
  
  System.out.println("TEST CASE " + i++);
  System.out.println("b1.equals(b2)? " + b1.equals(b2));//TRUE
  System.out.println();
  //End Test Case 3

 }
}

The hashCode method

The value returned by Object.hashCode() is the object's memory address in hexadecimal. By default, only object instances pointing to the same memory location are equal. If the parameters for establishing equality change, then the parameters for establishing the equality of hash code must also change. Since it is impossible to equate the memory address values of objects stored in different memory locations, we must use some other criteria to create a pseudo hash code. The following is the recommended pattern to override the hashCode method:
  1. Create an arbitrary non-zero CONSTANT integer. The example shown in this article uses 31.
  2. Create an primitive integer (int) variable with a default value of 1.
  3. Involve only significant variables of your object in the calculation of the hash code. Typically, all the variables that are part of equals comparison should be considered for this. You may ask, what is a significant variable? Let use a Book object as an example. A Book has many attributes: author name, title, number of pages, publisher, etc. But there is a particular attribute that is very unique: ISBN. Two books could have the same title, but different authors. The same author could have written books on the same topic, and even the same book may have different editions, all of which will have a unique ISBN. Therefore, in the case of a Book class, the ISBN might be the only significant variable contained in a Book object.
  4. For each significant variable (sigVar), compute hash code result as follows:
    1. If the significant variable is of primitive type byte, char, short or int, then:
      result = prime * result + sigVar;
    2. If the significant variable is of primitive type long, then:
      result = prime * result + (int)(sigVar ^ (sigVar >>> 32));
    3. If the significant variable is primitive type float, then:
      result = prime * result + Float.floatToIntBits(signVar);
    4. If the significant variable is primitive type double, then:
      long bits = Double.doubleToLongBits(sigVar);
      result = prime * result + (int) (bits ^ (bits >>> 32));
    5. If the significant variable is primitive type boolean, then:
      result = prime * result + (sigVar ? 1231 : 1237);
      where 1231 and 1237 are arbitrary numbers.
    6. If the significant variable is an object reference, then check if it is null first. If it is null, then result = 0. Otherwise, invoke the hashCode method recursively on this object reference to get the hash code. This can be simplified as:
      result = prime * result + ((sigVar == null) ? 0 : sigVar.hashCode());

Conclusion

  • Use == to compare primitive type values or to determine whether two objects are actually the same object reference.
  • Use the equals method to determine whether or not two different object contain the same data (structure and value); which requires method overriding. Implementation of equals should contain the comparison listed above.

Comments

Popular posts from this blog

Combining State and Singleton Patterns to Create a State-Machine

The Beauty of the Null Object Pattern

Exception Handling: File CRUD Operations Example