views:

877

answers:

1

This is a continuation of questions like this one.

Is there known guidlines for tweek the performance. I don't mean gains in big-O, just saving some linear time.

For example, how much does pre-sorting save on either SortedList or SortedDictionary?

Say I have a person-class with 3 properties to sort on, one of them is age in years. Should I bucket the objects on age first?

Should I first sort on one property, then use the resulting list/dictionary to sort on two properties and so on?

Any other optimazations that spring to mind?

+5  A: 

Well, it's an easy win on SortedList. Inserting an item requires a binary search (O(log(n)) to find the insertion point, then a List.Insert (O(n)) to insert the item. The Insert() dominates, populating the list requires O(n^2). If the input items are already sorted then the Insert collapses to O(1) but doesn't affect the search. Populating is now O(nlog(n)). You don't worry how big the Oh is, sorting first is always more efficient. Assuming you can afford the doubled storage requirement.

SortedDictionary is different, it uses a red-black tree. Finding the insertion point requires O(log(n)). Rebalancing the tree might be required afterwards, that also takes O(log(n)). Populating the dictionary thus takes O(nlog(n)). Using sorted input does not change the effort to find the insertion point or rebalancing, it is still O(nlog(n)). Now the Oh matters though, inserting sorted input requires the tree to constant rebalance itself. It works better if the input is random, you don't want sorted input.

So populating SortedList with sorted input and populating SortedDictionary with unsorted input is both O(nlog(n)). Ignoring the cost of providing sorted input, the Oh of SortedList is smaller than the Oh of SortedDictionary. That's an implementation detail due to the way List allocates memory. It only has to do so O(log(n)) times, a red-black tree has to allocate O(n) times. Very small Oh btw.

Notable is that neither one compares favorably over simply populating a List, then calling Sort(). That's also O(nlog(n)). In fact, if input is already accidentally sorted you can bypass the Sort() call, this collapses to O(n). The cost analysis now needs to move to the effort it takes to get the input sorted. It is hard to bypass the fundamental complexity of Sort(), O(nlog(n)). It might not be readily visible, you might get the input sorted by, say, a SQL query. It will just take longer to complete.

The point of using either SortedList or SortedDictonary is to keep the collection sorted after inserts. If you only worry about populating but not mutating then you shouldn't use those collections.

Hans Passant