views:

79

answers:

2

Given a List of MyClass objects (and a custom Comparitor myComparitor if needed), what good options are there for checking if the List contains two "equal" objects?

Edit: if there are duplicates, return a reference to one or more of the duplicates.

Overriding MyClass.equals(MyClass) in this case is not an option.

My initial thought is to create a hash table of sorts, but I suspect that there's a non-hack way to accomplish the same thing:

SortedSet mySet = new TreeSet(myComparitor);
mySet.addAll(myList);
// Find duplicates in a sorted set in O(N) time

P.S. Is there a good reference on Markdown?

+2  A: 

If the element's equals(Object) method does not give you the semantic that you require, then HashMap or HashSet are not options. Your choices are:

  • Use a TreeMap for de-duping. This is O(NlogN).
  • Sort the ArrayList or a copy, then iterate over looking for element i equals element i + 1. This is O(NlogN).
  • Find an alternative implementation of hash sets that allows you to provide a separate object to implement equality and hashing. (Neither Apache or Google collections support this, so you'll need to look further afield.)
  • Create a wrapper class for your element type that overrides equals(Object) and hashCode(), and de-dup using a HashSet of wrapped objects. This is O(N), but the constant of proportionality will be larger than a simple HashSet due to creation of wrapper objects.

When de-duping with a Set it is probably better to use a loop rather than addAll. This is necessary if you need to know what all of the duplicates are. If you don't need to know that, then using a loop allows you to stop when you find the first duplicate. The only case where addAll is likely to perform better is when there are likely to be no duplicates.

Stephen C
Thanks, that's a good point - I can create a copy of the List and simply sort it. I'll probably go with this approach.(And another good point - I can create a TreeMap keyed by a manually generated hash value.)Thanks for the performance tip about `Set.addAll()`. I'm re-writing code which performs in `O(N^2)` and I think `O(NlogN)` should be acceptable (if the constant of proportionality is low).
Daniel
A: 

if you already have a sorted list, you could just look at any element and the next element, and if they're the same you have dups.

in your question you are using a TreeSet, which would have culled out duplicates already, so if you just need to know if you have duplicates, just check the size of mySet vs the size of myList. if they aren't the same you have dups.

John Gardner
Thanks, I've edited the post above to clarify the question. (You're right that looking for duplicates in a sorted list is simple. The TreeSet would automatically de-dupify if I create a wrapper class overriding Object.equals(), but there's overhead involved in doing that.)
Daniel