views:

112

answers:

2

Hello,

I am trying to optimize a piece of code which compares elements of list.

Eg.

public void compare(Set<Record> firstSet, Set<Record> secondSet){
    for(Record firstRecord : firstSet){
        for(Record secondRecord : secondSet){
            // comparing logic
        }
    }
}

Please take into account that the number of records in sets will be high.

Thanks

Shekhar

+5  A: 
firstSet.equals(secondSet)

It really depends on what you want to do in the comparison logic... ie what happens if you find an element in one set not in the other? Your method has a void return type so I assume you'll do the necessary work in this method.

More fine-grained control if you need it:

if (!firstSet.containsAll(secondSet)) {
  // do something if needs be
}
if (!secondSet.containsAll(firstSet)) {
  // do something if needs be
}

If you need to get the elements that are in one set and not the other:

Set one = firstSet.removeAll(secondSet);
Set two = secondSet.removeAll(firstSet);

If the contents of one and two are both empty, then you know that the two sets were equal. If not, then you've got the elements that made the sets unequal.

You mentioned that the number of records might be high. If the underlying implementation is a HashSet then the fetching of each record is done in O(1) time, so you can't really get much better than that. TreeSet is O(log n).

Noel M
+1 for adding containsAll
stacker
The implementation of equals() and hashcode() for the Record class is equally important, when invoking equals() on the Set.
Vineet Reynolds
+1  A: 

If you simply want to know if the sets are equal, the equals method on AbstractSet is implemented roughly as below:

    public boolean equals(Object o) {
        if (o == this)
            return true;
        if (!(o instanceof Set))
            return false;
        Collection c = (Collection) o;
        if (c.size() != size())
            return false;
        return containsAll(c);
    }

Note how it optimizes the common cases where:

  • the two objects are the same
  • the other object is not a set at all, and
  • the two sets' sizes are different.

After that, containsAll(...) will return false as soon as it finds an element in the other set that is not also in this set. But if all elements are present in both sets, it will need to test all of them.

The worst case performance therefore occurs when the two sets are equal but not the same objects. The cost is O(N) or O(NlogN) depending on the implementation of this.

Stephen C