views:

661

answers:

5

Hi all, is there a way to optimize the speed of the insertions in a java.util.Collection by specifying the order of the items ?

For example

java.util.Set<String> set = java.util.TreeSet<String>();

will this solution:

set.add("A");
set.add("B");
set.add("C");
set.add("D");
set.add("E");

be faster than this one (random order) ?

set.add("E");
set.add("D");
set.add("C");
set.add("A");
set.add("B");

(and the same question for the other collections: HashMap, hastable...)

Thanks

+3  A: 

Insertion time for a red-black tree (which is used to implement Java's TreeSet/TreeMap) is guaranteed worst case to be O(log n). It could be faster if the items are in a particular order, but I'm unsure what that would be (probably pre-sorted would be fastest?).

Insertion into a hashtable is a O(1) (constant time) operation. The main thing done for insertion is calculation of the hashcode.


Edit: Starblue suggests pre-sorted may yield the worst-case performance so you could try randomized order.

Mark Renouf
Pre-sorted usually leads to a lot of imbalance, so it is most likely the worst case.
starblue
I agree, if you were trying to speed it up, it would be best, to sort the list, find the median, then insert going out in both directions from the median. No subtree reordering would be necessary at that point.
Nick
But the sorting would take more time than is gained later. In the end this is all useless micro-optimization.
starblue
+9  A: 

The easy answer is "time it and see".

The other answer is "it won't matter". This seems to be a micro-optimization that is hardly worth the effort. I think it falls into the category of "The Sad Tragedy of Micro-Optimization Theater".

duffymo
I store a *lot* of objects in a BerkeleyDB. Those objects contain a Map and reading/writing this map to an array of bytes may be an imortant factor.
Pierre
@Pierre: If you already have a BerkleyDB, you will gain much more performance by directly using the DB and tuning it correctly vs. any micro-optimisations you can make when inserting into a redundant data structure.
David Schmitt
@David thanks for the suggestion
Pierre
@Pierre ordering insertions into BDB actually can have huge impact: at least with native BDB, inserting in key order is order of magnitude faster than in random (yes, we have tested this). In fact, our processing is done by writing to disk, merge sorting, inserting, and combination is 5x faster than doing straight inserts.But trying to optimize hash/tree map is less likely to matter, for multiple reasons ((a) it's not the bottleneck, bdb will be; (b) there may not be any optimization)
StaxMan
+2  A: 

There is naturally a huge difference between hash-based collections and tree-based ones.

Tree based ones benefit from element ordering for insertion (e.g., comparisons between strings), so when you have comparable objects (like string) it is better to use them. The TreeSet/TreeMap/etc. in the standard collection is supposed to be balanced (red-black tree) so insertion order doesn't matter that much. If it was not balanced, then insertion order would matter since you could end up with a chain instead of a tree.

In hash tables, the loading factor and hashing function decide everything, but if you're dealing with strings, you may be better of not even bothering with hashing.

If you need a set of strings for many strings with overlaps, a Trie may be more memory efficient, but I don't think that there is one in the library.

Uri
+5  A: 

No for java.util.Map and java.util.Set, because these are interfaces, and there are different impementations.

For concrete implementations it is not a worthwhile optimization. If you have problems with performance chose a better suited implementation, or rethink what and how you need to store.

Inserting 5000 random numbers into a HashSet takes about a millisecond on a run-of-the-mill laptop, so how many millions of elements do you want to insert to make this kind of optimization worthwhile?

starblue
+1  A: 

Be careful to consider the characteristics of your data structure when taking optimization measures. For one extreme example, inserting elements into a binary tree in sorted order would result in a linked list.

Bill the Lizard
Assuming tree is not re-balanced, which I believe is generally done (at least for BDBs etc).
StaxMan