views:

1852

answers:

6

I have a list/collection of objects that may or may not have the same property values. What's the easiest way to get a distinct list of the objects with equal properties? Is one collection type best suited for this purpose? For example, in C# I could do something like the following with LINQ.

var recipients = (from recipient in recipientList
                 select recipient).Distinct();

My initial thought was to use lambdaj (link text), but it doesn't appear to support this.

+10  A: 

Use an implementation of the interface Set<T> (class T may need a custom .equals() method, and you may have to implement that .equals() yourself). Typically a HashSet does it out of the box : it uses Object.hashCode() and Object.equals() method to compare objects. That should be unique enough for simple objects. If not, you'll have to implement T.equals() and T.hashCode() accordingly.

See Gaurav Saini's comment below for libraries helping to implement equals and hashcode.

subtenante
HashSet also uses equals if there's a hash collision.
Steve Kuo
This is incorrect - Object.hashCode() checks for identity , not meaningful equality. For 2 objects - different references - which are meaningfully equal Object.hashCode() will return false. Always implement hashCode() and equals() for objects which will be used in Sets or as keys in Maps.
Robert Munteanu
Not quite correct. Let me give you an example: If hashCode() returns 1 (perfectly legal), then this results in a hash collision and thus equals will then be called by HashSet (actually it's backed by a HashMap).
Steve Kuo
@Robert : No need to downvote me for wrong reasons, read the javacode from Object.hashCode() : If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
subtenante
@Steve : Ok thanks. Added mention to .equals in the description of HashSet's comparison.
subtenante
Alternatively, use standard way of overriding hashCode() and equals() method using HashCodeBuilder and EqualsBuilder provided by Apache's Commons.HashCodeBuilder: http://commons.apache.org/lang/api-2.3/org/apache/commons/lang/builder/HashCodeBuilder.htmlEqualsBuilder: http://commons.apache.org/lang//api-2.4/org/apache/commons/lang/builder/EqualsBuilder.html
Gaurav Saini
@subtenante: My point was about hashCode _and_ equals - which should both be implemented if you plan to add objects in Sets/Maps. And let's not talk about public int hashCode() { return 1; } ;-) I'm sure there are plenty of places where this is villified.
Robert Munteanu
+4  A: 

Place them in a TreeSet which holds a custom Comparator, which checks the properties you need:

SortedSet<MyObject> set = new TreeSet<MyObject>(new Compararator<MyObject>{

    public int compare(MyObject o1, MyObject o2) {
         // return 0 if objects are equal in terms of your properties
    }
});

set.addAll(myList); // eliminate duplicates
Robert Munteanu
Only works if the comparator is consistent with equals(), at which point you're better off with a HashSet anyway.
Michael Myers
Why should it be consistent with equals() ? That it the usually the case but at the moment we need to remove some duplicates based on a custom condition. A Comparator is the most unobstrusive way I can think of.
Robert Munteanu
Interestingly, the TreeSet docs indicate that there will be problems if the Comparator is inconsistent with equals(), while the docs for TreeMap (which TreeSet is based on) merely say that equals() will be ignored in that case. I now think this is the best answer. +1
Michael Myers
The problem with inconsistent comparator is when you use a TreeSet, pass it around as a Set - which the consumer expects to obey hashCode() and equals(). Then all hell breaks loose :-)
Robert Munteanu
+2  A: 

You can use a Set. There's couple of implementations:

  • HashSet uses an object's hashCode and equals.
  • TreeSet uses compareTo (defined by Comparable) or compare (defined by Comparator). Keep in mind that the comparison must be consistent with equals. See TreeSet JavaDocs for more info.

Also keep in mind that if you override equals you must override hashCode such that two equals objects has the same hash code.

Steve Kuo
+2  A: 

The ordinary way of doing this would be to convert to a Set, then back to a List. But you can get fancy with Functional Java. If you liked Lamdaj, you'll love FJ.

recipients = recipients
             .sort(recipientOrd)
             .group(recipientOrd.equal())
             .map(List.<Recipient>head_());

You'll need to have defined an ordering for recipients, recipientOrd. Something like:

Ord<Recipient> recipientOrd = ord(new F2<Recipient, Recipient, Ordering>() {
  public Ordering f(Recipient r1, Recipient r2) {
    return stringOrd.compare(r1.getEmailAddress(), r2.getEmailAddress());
  }
});

Works even if you don't have control of equals() and hashCode() on the Recipient class.

Apocalisp
Why is it necessary to add the ordering for the objects?
javacavaj
You need the ordering so that the sort method knows how to order them.
Apocalisp
A: 
return new ArrayList(new HashSet(recipients));
Chase Seibert
A: 

Actually lambdaj implements this feature through the selectDistinctArgument method

http://lambdaj.googlecode.com/svn/trunk/html/apidocs/ch/lambdaj/Lambda.html#selectDistinctArgument(java.lang.Object,%20A)

Mario Fusco