views:

400

answers:

6

Hey all,

This is a two-part question:

First, I am interested to know what the best way to remove repeating elements from a collection is. The way I have been doing it up until now is to simply convert the collection into a set. I know sets cannot have repeating elements so it just handles it for me.

Is this an efficient solution? Would it be better/more idiomatic/faster to loop and remove repeats? Does it matter?

My second (related) question is: What is the best way to convert an array to a Set? Assuming an array arr The way I have been doing it is the following:

Set x = new HashSet(Arrays.asList(arr));

This converts the array into a list, and then into a set. Seems to be kinda roundabout. Is there a better/more idiomatic/more efficient way to do this than the double conversion way?

Thanks!

+6  A: 
  1. Do you have any information about the collection, like say it is already sorted, or it contains mostly duplicates or mostly unique items? With just an arbitrary collection I think converting it to a Set is fine.

  2. Arrays.asList() doesn't create a brand new list. It actually just returns a List which uses the array as its backing store, so it's a cheap operation. So your way of making a Set from an array is how I'd do it, too.

John Kugelman
+1  A: 

Assuming you really want set semantics, creating a new Set from the duplicate-containing collection is a great approach. It's very clear what the intent is, it's more compact than doing the loop yourself, and it leaves the source collection intact.

For creating a Set from an array, creating an intermediate List is a common approach. The wrapper returned by Arrays.asList() is lightweight and efficient. There's not a more direct API in core Java to do this, unfortunately.

erickson
+4  A: 

Use HashSet's standard Collection conversion constructor. According to The Java Tutorials:

Here's a simple but useful Set idiom. Suppose you have a Collection, c, and you want to create another Collection containing the same elements but with all duplicates eliminated. The following one-liner does the trick.

Collection<Type> noDups = new HashSet<Type>(c);

It works by creating a Set (which, by definition, cannot contain a duplicate), initially containing all the elements in c. It uses the standard conversion constructor described in the The Collection Interface section.

Here is a minor variant of this idiom that preserves the order of the original collection while removing duplicate element.

Collection<Type> noDups = new LinkedHashSet<Type>(c);

The following is a generic method that encapsulates the preceding idiom, returning a Set of the same generic type as the one passed.

public static <E> Set<E> removeDups(Collection<E> c) {
    return new LinkedHashSet<E>(c);
}
William Brendel
+1  A: 

I think your approach of putting items into a set to produce the collection of unique items is the best one. It's clear, efficient, and correct.

If you're uncomfortable using Arrays.asList() on the way into the set, you could simply run a foreach loop over the array to add items to the set, but I don't see any harm (for non-primitive arrays) in your approach. Arrays.asList() returns a list that is "backed by" the source array, so it doesn't have significant cost in time or space.

Carl Manaster
+1  A: 

1. Duplicates

Concurring other answers: Using Set should be the most efficient way to remove duplicates. HashSet should run in O(n) time on average. Looping and removing repeats would run in the order of O(n^2). So using Set is recommended in most cases. There are some cases (e.g. limited memory) where iterating might make sense.

2. Arrays.asList() is a cheap operation that doesn't copy the array, with minimal memory overhead. You can manually add elements by iterating through the array.


public static  Set arrayToSet(T[] array) {
  Set set = new HashSet(array.length / 2);
  for (T item : array)
    set.add(item);
  return set;
}
notnoop
+1  A: 

Barring any specific performance bottlenecks that you know of (say a collection of tens of thousands of items) converting to a set is a perfectly reasonable solution and should be (IMO) the first way you solve this problem, and only look for something fancier if there is a specific problem to solve.

Yishai