views:

728

answers:

4

Are there any resources about the asymptotic complexity (big-O and the rest) of methods of .NET collection classes (Dictionary<K,V>, List<T> etc...)?

I know that the C5 library's documentation includes some information about it (example), but I'm interested in standard .NET collections too... (and PowerCollections' information would also be nice).

+10  A: 

MSDN Lists these:

etc. For example:

The SortedList(TKey, TValue) generic class is a binary search tree with O(log n) retrieval, where n is the number of elements in the dictionary. In this, it is similar to the SortedDictionary(TKey, TValue) generic class. The two classes have similar object models, and both have O(log n) retrieval. Where the two classes differ is in memory use and speed of insertion and removal:

SortedList(TKey, TValue) uses less memory than SortedDictionary(TKey, TValue).

SortedDictionary(TKey, TValue) has faster insertion and removal operations for unsorted data, O(log n) as opposed to O(n) for SortedList(TKey, TValue).

If the list is populated all at once from sorted data, SortedList(TKey, TValue) is faster than SortedDictionary(TKey, TValue).

Marc Gravell
-1 totally wrong. SortedList is an array-based collection, while SortedDictionary is a binary tree based collection. Both are unsuitable for large number of elements.
Stephan Eggermont
@Stephan: That sounds like an implementation detail, while Msdn list the characteristics of the type. While SortedList does use two arrays (keys and values) to store values, it still uses Array.BinarySearch (within the IndexOfKey method) to retrieve values.
Simon Svensson
Oh, I see... my direct quote from MSDN is "totally wrong"... interesting. If you believe MSDN is wrong/misleading, then fine: leave a comment on MSDN.
Marc Gravell
I don't think MSDN is wrong, just read what they wrote. SortedList is no binary search tree. You know the difference between a tree and an array, do you?
Stephan Eggermont
Stop being patronising. The text is *from* MSDN; so either you agree with it or you don't. But you can't claim "I don't think MSDN is wrong" *and* state "SortedList is no binary search tree". Now fine, the word "tree" might be misleading *but it isn't my word!*. The key point here is that is uses binary search and hence has O(log n) retreival.
Marc Gravell
Start reading instead of making stupid comments. Binary search tree is a very specific structure http://en.wikipedia.org/wiki/Binary_search_tree. SortedList is not a BST.
Stephan Eggermont
The generic version is wrong, I've complained at MSDN
Stephan Eggermont
Regardless of whether it is a Binary Search "Tree" or just a "Binary Search" (which, in fact, it is...reflector shows that Array.BinarySearch(..) is used in SortedList), the performance factor doesn't change. Average case for searches with Binary Search and Binary Search Tree are O(log n). Worst case for a binary search is O(log n) where as its O(n) for a binary search tree. You nitpicking a point that doesn't matter. Don't underestimate the value of a binary search for presorted data.
jrista
Started writing a comment, but - he simply isn't worth it.
Marc Gravell
Oh, of course nobody cares if you use three times as much memory. Making it three times slower for a search.
Stephan Eggermont
3 times memory != 3 times slower. Indeed, part of the nature of indexing is that you generally trade memory (i.e. use more) for speed (i.e. quicker).
Marc Gravell
You seem to be totally unaware of modern computer architecture. The key is 32 or 64 bit. Comparing it is essentially free. The only thing that counts is how much memory you touch. Measure it (for large collection size).
Stephan Eggermont
And of course touching memory that's in the same cache line is 10 times or so faster.
Stephan Eggermont
Your comments are unrelated to anything I've said; you're just trolling.
Marc Gravell
MSDN made the corrections
Stephan Eggermont
A: 

There's no such thing as "complexity of collection classes". Rather, different operations on these collections have different complexities. For instance, adding an element to a Dictionary<K, V>...

...approaches an O(1) operation. If the capacity must be increased to accommodate the new element, this method becomes an O(n) operation, where n is Count.

Whereas retrieving an element from a Dictionary<K, V>...

...approaches an O(1) operation.

Anton Gogolev
I meant their operations, I've edited the question to make it clearer.
Igor Brejc
+4  A: 

This page summarises some of the time comlplexities for various collection types with Java, though they should be exactly the same for .NET.

I've taken the tables from that page and altered/expanded them for the .NET framework. See also the MSDN pages for SortedDictionary and SortedList, which detail the time complexities required for various operations.


Searching

Type of Search/Collection Types           Complexity  Comments
Linear search Array/ArrayList/LinkedList  O(N)        Unsorted data.
Binary search sorted Array/ArrayList/     O(log N)    Requires sorted data.
Search Hashtable/Dictionary<T>            O(1)        Uses hash function.
Binary search SortedDictionary/SortedKey  O(log N)    Sorting is automated.

Retrieval and Insertion

Operation         Array/ArrayList  LinkedList  SortedDictionary  SortedList
Access back       O(1)             O(1)        O(log N)          O(log N)
Access front      O(1)             O(1)        N.A.              N.A.
Access middle     O(1)             O(N)        N.A.              N.A.
Insert at back    O(1)             O(1)        O(log N)          O(N)
Insert at front   O(N)             O(1)        N.A.              N.A.
Insert in middle  O(N)             O(1)        N.A.              N.A.

Deletion should have the same complexity as insertion for the associated collection.

SortedList has a few notable peculiarities for insertion and retrieval.

Insertion (Add method):

This method is an O(n) operation for unsorted data, where n is Count. It is an O(log n) operation if the new element is added at the end of the list. If insertion causes a resize, the operation is O(n).

Retrieval (Item property):

Retrieving the value of this property is an O(log n) operation, where n is Count. Setting the property is an O(log n) operation if the key is already in the SortedList<(Of <(TKey, TValue>)>). If the key is not in the list, setting the property is an O(n) operation for unsorted data, or O(log n) if the new element is added at the end of the list. If insertion causes a resize, the operation is O(n).

Note that ArrayList is equivalent to List<T> in terms of the complexity of all operations.


Noldorin
Are you sure that the complexities should be the same for .NET? I would think it's more subtle than that - for example, there's a difference between SortedDictionary, SortedList and Hashtable in .NET.
Igor Brejc
Yeah, there's no fundamental difference - the basic algorithms and data structures are going to be virtually identical. I haven't detailed SortedDictionary/SortedList, but I'll add them in now. Hashtable ought to have the same complexities as Dictionary, I believe (it's pretty much a non-generic version of it).
Noldorin
A: 

The basic collection classes are all unusable in a 64-bit environment. They all either use too much memory or move too much memory when used with larger collection sizes. You need something like a BTree or Judy tree, working with multiple blocks of memory.

[edit] Voting down without making clear what you don't like doesn't make sense. I'm pretty sure everyone who has used >4G has noticed that the basic collections don't perform.

The array based ones don't because they move too much memory, and the binary tree based ones take too much memory per element and have bad cache behaviour

[edit2] Ok, so it is simply what you don't understand and have no practical experience with.

64-bit is relevant because I want to be able to do a time-space tradeoff. The basic collections don't allow me to do that. And Big-O is irrelevant when you don't know the constant, as memory is limited.

Stephan Eggermont
I don't think your answer is relative to the question asked. The question was about the complexity of .NET collections, not about their memory usage. I don't see how 64bit has anything to do with the question either.
jrista
(replied to your comment)
Marc Gravell
(replied to your second comment)
Marc Gravell
If you do not understand the relation between memory usage and time complexity, start reading instead of voting.
Stephan Eggermont