views:

355

answers:

6

If I have a Map like this:

HashMap<Integer, ComparableObject> map;

and I want to obtain a collection of values sorted using natural ordering, which method is fastest?

(A)

Create an instance of a sortable collection like ArrayList, add the values, then sort it:

List<ComparableObject> sortedCollection = new ArrayList<ComparableObject>(map.values());
Collections.sort(sortedCollection);

(B)

Create an instance of an ordered collection like TreeSet, then add the values:

Set<ComparableObject> sortedCollection = new TreeSet<ComparableObject>(map.values());

Note that the resulting collection is never modified, so the sorting only needs to take place once.

+3  A: 

Theoretically, sorting at the end should be faster. Maintaining sorted state through the process could involve additional CPU time.

From the CS points of view, both operations are NlogN, but 1 sort should have lower constant.

BarsMonster
+16  A: 

TreeSet has a log(n) time complexity guarantuee for add()/remove()/contains() elements. Sorting an ArrayList takes n*log(n) operations, but adding/retrieving data from the list takes only 1 operation.

So if youre mainly retrieving, and dont sort often, ArrayList is the better choice. If you sort often but dont retrieve that much TreeSet would be a better choice.

hope that helped...

smeg4brains
In my case we only need to iterate through the resulting collection, it never gets modified. So based on your answer `ArrayList` is the better choice here.
gutch
A: 

Using a well designed data structure will be the most efficient when the dataset grows large.

Thorbjørn Ravn Andersen
+2  A: 

Be sure to read my comment about TreeSet at the bottom if you choose to implement B)

If your app only does occasional sorts but iterates through it a lot, I'd say you're best off using a straightforward unsorted list. Sort it the once and then benefit from faster iteration. Iteration is especially fast on an array list.

However if you want sort order to be guaranteed all of the time or you are possibly adding / removing elements frequently then use a sorted collection and take the hit on iteration.

So in your case I would say A) is the better option. The list is sorted once, doesn't change and therefore benefits from being an array. Iteration should be very fast, especially if you know its an ArrayList and can directly use the ArrayList.get() instead of an Iterator.

I'd also add that TreeSet by definition is a Set which means objects are unique. A TreeSet determines equality by using compareTo on your Comparator / Comparable. You could easily find yourself missing data if you try to add two objects whose compareTo returns a value of 0. e.g. adding "C", "A", "B", "A" to a TreeSet will return "A", "B", "C"

locka
Good point about `TreeSet` potentially missing data if compareTo returns 0. I have determined that in this particular case the compareTo implementation will never return 0, so both `TreeSet` and `ArrayList` will behave the same. However I have been caught out by that problem before so thanks for the reminder!
gutch
A PriorityQueue is probably better for sorting a list than a TreeSet.
locka
@locka yup, in my benchmark (see my answer) PriorityQueue outperforms TreeSet by 600 to 700 %.
seanizer
`PriorityQueue` does indeed perform faster, but when I tried it the values were not actually sorted — obviously why it was so fast!Maybe I misinterpreted how to use PriorityQueue... an example of it actually working would be useful.
gutch
A PriorityQueue is just a queue with a comparator / comparable test. When you add() items to the queue, the insert compares the new item to the ones already there to determine the position to insert at. When you poll() the queue, or iterate it, the contents are already sorted. I expect insertion is done via some kind of recursive algorithm, i.e. split list in two and determine which half to insert it in, split in two again and so on so performance is going to be O(log N) which is theoretically the same as TreeSet / TreeMap, but the implementation may make it faster.
locka
A: 

Why not use the best of both worlds? If you are never using it again, sort using a TreeSet and initialize an ArrayList with the contents

List<ComparableObject> sortedCollection = 
    new ArrayList<ComparableObject>( 
          new TreeSet<ComparableObject>(map.values()));

EDIT:

I have created a benchmark (you can access it at pastebin.com/5pyPMJav) to test the three approaches (ArrayList + Collections.sort, TreeSet and my best of both worlds approach) and mine always wins. The test file creates a map with 10000 elements, the values of which have an intentionally awful comparator, and then each of the three strategies get a chance to a) sort the data and b) iterate over it. Here is some sample output (you can test it yourselves):

EDIT: I have added an aspect that logs calls to Thingy.compareTo(Thingy) and I have also added a new Strategy based on PriorityQueues that is much faster than either of the previous solutions (at least in sorting).

compareTo() calls:123490
Transformer ArrayListTransformer
    Creation: 255885873 ns (0.255885873 seconds) 
    Iteration: 2582591 ns (0.002582591 seconds) 
    Item count: 10000

compareTo() calls:121665
Transformer TreeSetTransformer
    Creation: 199893004 ns (0.199893004 seconds) 
    Iteration: 4848242 ns (0.004848242 seconds) 
    Item count: 10000

compareTo() calls:121665
Transformer BestOfBothWorldsTransformer
    Creation: 216952504 ns (0.216952504 seconds) 
    Iteration: 1604604 ns (0.001604604 seconds) 
    Item count: 10000

compareTo() calls:18819
Transformer PriorityQueueTransformer
    Creation: 35119198 ns (0.035119198 seconds) 
    Iteration: 2803639 ns (0.002803639 seconds) 
    Item count: 10000

Strangely, my approach performs best in iteration (I would have thought there would be no differences to the ArrayList approach in iteration, do I have a bug in my benchmark?)

Disclaimer: I know this is probably an awful benchmark, but it helps get the point across to you and I certainly did not manipulate it to make my approach win.

(The code has a dependency to apache commons / lang for the equals / hashcode / compareTo builders, but it should be easy to refactor it out)

seanizer
Wouldn't that actually be the worst of both worlds? All I need is a collection in natural order, which is what `new TreeSet<ComparableObject>(map.values())` returns. Wrapping that in an `ArrayList` is just going to add unnecessary operations.
gutch
The end goal was a sorted `Collection`... which `TreeSet` is. I see no value is converting the set to a list here.
Gunslinger47
it's not wrapping, it's initializing. and and arraylist is better at retrieving while the treeset is better at sorting
seanizer
see my comment with the benchmark. you'll be surprised (I was)
seanizer
I appeciate the effort you've gone to in writing the benchmark! However I think there is a flaw in it. It appears that the JVM runs `Transformer` instances that are later in the list faster than earlier ones: put `BestOfBothWorldsTransformer` first and it suddenly runs much slower. So I have rewritten your benchmark to randomly select a transformer and average the results. In my test `TreeSetTransformer` consistently beats `BestOfBothWorldsTransformer`, which consistently beats `ArrayListTransformer` — not what I expected at all! The difference is tiny though. See http://pastebin.com/L0t5QDV9
gutch
I know what your next question is: what about PriorityQueueTransformer? Isn't it massively faster than the others? Well yes it is, too bad though that it doesn't get the order correct! Have a look at the lists generated by each transformer in my code above, and you'll see that PriorityQueueTransformer is not actually in order! Maybe I am using `PriorityQueue` incorrectly? Do you have an example of it actually sorting correctly?
gutch
@gutch: yes, I thought the benchmark was flawed. Also, PriorityQueue was an idea I picked up from a comment somewhere, I don't really know the class. Thanks for your efforts, anyway.
seanizer
A: 

Collections.sort uses mergeSort which has O(nlog n).

TreeSet has Red-Black tree underlying, basic operations has O(logn). Hence n elements has also O(nlog n).

So both are same big O algorithm.

卢声远 Shengyuan Lu