views:

1564

answers:

10

Does anyone know a good resource to concisely explain the different types of lists available in C# and when their usage is appropriate?

For example, List, Hashtable, Dictionaries etc.

I'm never quite sure when I should be using what.

+1  A: 

If you start at the MSDN doco for System.Collections, you can drill down into the individual collection types for more details about those "lists" and how to use them. For example, the doco for Hashtable says, "Represents a collection of key/value pairs that are organized based on the hash code of the key."

There's also a nice discussion of System.Collections.Generic in Understanding Generics.

Robert S.
A: 

Intellisense will show you a short description of each if you just type System.collections.Generic. into a code window. Don't forget the trailing period. Oh, and there's also System.Collections.ObjectModel.. From there you should be able to get more info on anything that looks promising from MSDN.

Joel Coehoorn
A: 

These are examples of various types of general data structures. These data structures are used all over the place in software engineering.

Justice
+2  A: 

List<T> is sortable, but not recommended to be exposed publicly.

Collection<T> is a a basic, no frills collection.

Dictionary<T> is a collection of key-value pairs (much like the old hashtable, but now generic).

KeyedCollection<T> is a dictionary where the key can be determined from the value (this is an abstract, so you must inherit from it and support the GetKey function)

ReadOnlyCollection<T> is a special collection where the contents cannot be modified.

ArrayList and HashTable are basically obsolete starting with .NET 2.0.

Jon B
+4  A: 

Hash maps

  • Dictionary
  • Hashtable (Non-generic)

This is a data structure that allows you to keep key-value pairs. Given a Key that has some way of being ordered, you can insert a Value. A simple example could be a list of students where the Key is the student ID, and the value the student name.

Random access lists

  • List
  • ArrayList (Non-generic)

Random access lists are used to store a long list of objects that are to be accessed randomly (i.e. you want to access the n'th element in O(1) time). It is not good if you want to insert/delete elements in the middle of the list, since that would require the entire list to be shuffled around which could take some time.

Linked lists and similar

  • LinkedList
  • Queue
  • Stack

Linked lists are great if you don't want to access elements in the middle, since that would take O(N) time. It's great if you want to insert/remove elements in the middle since it only involves changing a few pointers.

Queues and Stacks are slightly specialised, in that they are optimised for FIFO and FILO behaviour (First-In-First-Out and First-In-First-Out respectively).

Mats Fredriksson
+5  A: 

You should pick up a book about basic data structures. It's the same theory regardless of language.

A short explanation:

  • Array: (as in e.g. int[] myArray) - static array that can be used when the collection never changes (you can't add or remove items in it, but you can change the values of individual items)
  • ArrayList: general purpose array/list that allows relatively fast enumeration aswell as direct access. This list can grow automatically as you add items, but since it only stores Object, you should seldom use it due to performance and type-safety issues.
  • List<T>: Generic version of the above ArrayList. It provides a nice balance between performance and flexibility and should be used almost always when you have a dynamic flat list of items. (New in .NET 2.0)
  • Hashtable: Works like a flat list but instead of indexing it with integers, it can be indexed using any object. Worth noting is that there is no "order" in a hash table.
  • Dictionary<T>: Generic version of the Hashtable. Use this in .NET 2.0 and higher instead of the Hashtable for the same reasons as with ArrayList vs List above.
  • Stack<T>: Provides a first-in last-out type of list. The item you added last will be the item you receive first when you pick something out.
  • Queue<T>: Provides a first-in first-out list. Think of it as a tube, where you insert items at one end and pick them out in the other end. Typically used to pass messages between e.g. threads.

In general, you should use the generic collections for almost everything you do in .NET 2.0 and higher. You will get full type-safety (compared to e.g. ArrayList and HashTable) and they are much faster for value types (integers, structs, floats etc.) compared to non generic onces.

If you have a list of items that will never change, or you don't need/want the flexibility of List<T> you can of course use an array since it has the least amount of overhead of them all.

A recommendation when you return a collection from a public method or property is to cast it to a less flexible interface. So if you have a List that you return, you could cast it to an IEnumerable<int> which means that your consumer cannot add items to it (unless of course it casts it back, but its still an indication to the users). Casting it will also give you the flexibility to change the underlying datastructure later, while maintaining the API stability. You could also pick ICollection<int> or IList<int> to expose a bit more functionality but keeping the actual data structure hidden.

Isak Savo
+29  A: 

These aren't all lists, although they're all collections. Here's a quick summary.

Non-generic collections (API is in terms of object. Values types are boxed.

These are mostly in the System.Collections namespace:

  • ArrayList: A list of items, backed by an array. Fast random read/write access. Fast add to the tail end, if the buffer doesn't need resizing.
  • Hashtable: Map from key to value. Keys are unique, values don't have to be. Uses the GetHashCode method to achieve near O(1) read/write access (aside from nasty cases where all items have the same hash, or the backing store needs rebuilding). Iterating over the key/value pairs gives an unpredictable order. (Well, effectively unpredictable.)
  • SortedList: Like a Hashtable, but the entries are always returned in sorted-by-key order. Stored as a list of key/value pairs.
  • Stack: Last-in-first-out collection
  • Queue: First-in-first-out collection
  • Array: Fixed-size O(1) random-access; non-generic, but has strongly typed forms as well

Generic collections. (Strongly-typed API, will not box value types (assuming suitable T).

These are mostly in the System.Collections.Generic namespace:

Possibly the most important collection interface is IEnumerable (and IEnumerable<T>). This represents a sequence of items much like a Stream represents a sequence of bytes. There is no random access, just forward-reading. LINQ to Objects is based on this, and pretty much all collection types implement it.

Jon Skeet
There's also System.Collections.Specialized with more goodies - like StringDictionary and NameValueCollection
Hafthor
@Hafthor: Yes, although most of those have become somewhat obsolete with generics.
Jon Skeet
Very nice List<Collection Types> you have assembled. Everything I've ever used is on there.
stephenbayer
+1  A: 

In addition to the great answers so far, there are some more Collections available via the C5 Generic Collection Library. The documentation (also on their site) may help when deciding what to use depending on your requirments.

tobsen
+4  A: 

To expound on tobsen's earlier answer, the C5 Generic Collection Library has a large number of, well, collections. I'll describe some of them here:

Queue/Stack

  • CircularQueue<T>: This class provides strictly Queue and Stack functionality. As well, efficient O(1) access to any item in the Stack/Queue is available using the indexer: cq[0] (where 0 is the oldest item, next to be dequeued, last to be popped).

Lists

Note: ArrayList and LinkedList can also function as Queue/Stacks

  • ArrayList<T>: Similar to its counterpart in System.Collections.Generic (SCG), List<T>, this is backed by an array, guaranteeing O(1) indexing, but worst-case O(n) insertion.O(n) to find an item.
  • LinkedList<T>: Similar to its counterpart SCG.LinkedList<T>. Using a doubly-linked list, guarantees O(1) insertion, but worst-case O(n) indexing (in practice, is proportional to distance from either head or tail of the list). Also O(n) to find an item. Sorting uses a stable Merge Sort.
  • HashedArrayList<T>: Similar to the ArrayList<T> above, but does not allow duplicates. The benefit you get in return is that the time to find an item and its index is reduced to O(1).
  • HashedLinkedList<T>: Similar to the LinkedList<T> above, but does not allow duplicates. As before, the time to find an item is reduced to O(1), but time to find its index remains O(n).
  • WrappedArray<T>: Fairly similar to the ArrayList<T>, this acts as a wrapper around an array that implements C5.IList<T>, but throws exceptions if an attempt is made to modify the collection (IsFixedSize is true and Add, Remove, Insert don't work; Sort, Shuffle, and Reverse do, however, as they are in-place operations).

Lists also provide a "View" functionality which represents a segment of the underlying list, allowing local operations to be performed. Using patterns offered in the C5 book, operations can be performed using views that are efficient on both array and linked lists. Any list operation can also be performed on a view, restricting their effect to a subset of the underlying list.

Sorted Collections

  • SortedArray<T>: Similar to an ArrayList<T> except that it keeps its items sorted and does not allow duplicates. Note that random insertions and deletions on this collection are slow. This collection is best if the number of items is small or rarely modified but often accessed by item index or value.
  • TreeSet<T>: Uses a red-black tree structure to keep items sorted. As a set, it does not allow duplicates. Access by index or item value and insertion/deletion take O(log n).
  • TreeBag<T>: Uses a red-black tree, keeping items sorted. As a bag, it does allow duplicates, but does not store duplicates in the tree, rather keeping duplicates by counting.

Both TreeSet<T> and TreeBag<T> provide the ability to efficiently make "snapshots" or persistent copies of the tree in O(1), allowing iteration over the snapshot while modifying the underlying tree. Note that each snapshot on a tree adds a performance penalty to updates to the tree, but these effects go away when the snapshot is disposed.

Hash Collections

  • HashSet<T>: A collection using a simple hash table for storage. Access by item value takes O(1). As a set, it does not allow duplicates. Provides a function BucketCostDistribution() that can help tell you determine the efficiency of the items' hashcode function.
  • HashBag<T>: Similar to the HashSet<T>, but has bag semantics, meaning that duplicates are allowed, but duplicates are only stored by counting.

Priority Queue

  • IntervalHeap<T>: Provides a priority queue. Finding the maximum and minimum are O(1) operations, deleting the maximum, minimum, adding, and updating are O(log n) operations. Allows duplicates by storing them explicitly (not by counting).

Dictionaries

  • HashDictionary<H,K>: Similar to the SCG.Dictionary<H,K>, provides entry access, insertion, and deletion in O(1). Also provides a BucketCostDistribution() function as HashSet<T> above. Does not guarantee any particular enumeration order.
  • TreeDictionary<H,K>: Similar to the SCG.SortedDictionary<H,K>, provides a persistently sorted dictionary using a red-black tree. Entry access, insertion, and deletion take O(log n). Guarantees that enumeration of the dictionary follows the order specified by the key comparer.

Guarded Collections

As well, C5 also offers "guarded" collections, which effectively acts as a read-only wrapper, preventing the collection from being modified. Items in the collection still may be modified, but items can't be added, deleted, or inserted into the collection.

A long answer, but thorough on the C5 libraries various collections at your disposal. I have found the C5 library to be great and often use it in my own code, replacing the common C# header with:

using C5;
using SCG = System.Collections.Generic;
Marcus Griep
+1  A: 

MSDN has an article called Selecting a Collection Class that I find very useful when trying to figure out what kind of collection to use in a given situation.

dthrasher