views:

387

answers:

7

Most of my programming experience is in a language where there is one collection data structure -- an array. Now that I'm working primarily in .NET, I've come to appreciate the vast number of tools available to me, but I also find it difficult to determine which tools is best suited for each problem. I find this to be the case often with collections.

I'm sure I'll be able to spot the right tool for the job quicker with time/experience, but can anyone offer some guidance on which collection classes are good for which jobs? Any good rules of thumb to follow?

EDIT: I find that I use List(T) almost always, which is sort of what prompted this question. I know there are very specific reasons to use the other classes. Although List(T) works most times, I want to avoid jamming something into a generic list when another structure is better suited. I have to be able to spot these cases.

Thanks!

+15  A: 

You didn't say what language you used before, but I feel pretty confident in saying that if you believe that array was the only thing available, then you were probably mistaken.

C++ for example only supports array "collections" natively ("collections" used very loosely here), but with the addition of pointers you can implement an equivalent for any collections data structure available in .Net. In fact, if you look in the C++ standard template library you will find stock implementations for most of the common structures.

The reason for the additional structures is that an array is not always, or even often, the most appropriate structure to use for a collection of data. It has a number of limitations that can be solved by one collection or another, and using those different collections you can often get much greater performance out of much less code, and reduce the chance there's a bug in your data structure implementation as well.

When deciding what collection type to use, you need to look at how it will be used most ofen. For example, are all the objects in the collection expected to be of the same type, inherited from the same type, or any type? Are you going to be frequently adding and removing items? If so, will you always push/pop, queue/dequeue items or do you need to add items to specific locations? Will you lookup specific items by key, by index, or both? If by key, how is the key determined?

Some of the more common collections:

  • List<T> should probably be used in most of the situations where you're used to using an array. It supports lookup by index using the same syntax as an array with performance approaching that of an array, is strongly-typed, and makes it very easy to add or remove items and very fast to append or pop items (inserting to a specific position is much slower).

  • LinkedList<T> should sound familiar if you've done any formal computer science training. It uses syntax similar to List, but is optimized differently: lookups are slower because they require traversing the list, while adding or removing an item to a specific position can be much faster.

  • Dictionary<TKey, TValue> uses syntax similar to a List<T>, but instead of an array index you put a key value in the brackets. Dictionaries are great because lookups of specific items by key are considered to be very fast, in that no matter how many items are in the Dictionary it will always take about the same amount of time to find the one you need.

  • SortedList<TKey, TValue> works much a like a Dictionary, with the exception that when you iterate over it items are returned sorted by key. However, you can't lookup the nth item without first iterating all the items before it.

  • KeyedCollection is often overlooked because it's hidden in a different namespace from some of the other collections and you have to implement a (very easy) function to use it. It also works much like a dictionary, with the addition that it supports easy lookup by index. It is normally used when the key for an item is a simple property of the item itself.

  • Don't forget the old standbys: Stack and Queue. Again, if you have any formal computer science education at all you should already have a pretty good idea how those work based on their names.

Finally, most of these collections (array included!) implement a set of common interfaces. These interfaces are very useful, in that you can write a program against an interface rather than a specific collection, and then your function can accept any collection that implements that interface. For example, the following code will work whether you pass in a string array, a List<string>, or any other IEnumerable<string>:

void WriteToConsole(IEnumerable<string> items)
{
    foreach (string item in items)
    {
       Console.WriteLine(item);
    }
}

Other interfaces worth looking at include IList<T>, ICollection<T>, and IQueryable<T>.

Joel Coehoorn
Some things that you may want to add to your otherwise excellent reply: adding elements to List<T> is only fast if you add them at the end; and mention LinkedList<T>, which has very fast insertions and deletions anywhere, but does not support indexing elements directly.
Thomas
+1 concise answer.
Jon Tackabury
A: 

The collections like Stacks, Queues, SortedList, Dictionary, HashTable are all standard data structures which come in handy in various situations.

Queue enables FIFO implementation without you having to do it yourself. Stacks give you LIFO. SortedLists gives you a presorted list and so on.

There are many others in the collections namespace and there are all discussed here.

achinda99
+3  A: 

Generic Lists (List) are good for common use. They don't perform boxing and unboxing. so no performans problems.

List<string> items = new List<string>();
items.Add("abc");
items.Add("dfg");

ArrayLists accepts any object as item. so they are good for storing multiple typed situations. For example if you need to store an int and a string in same collection arraylist is good for this.

ArrayList items = new ArrayList();
items.Add("abc");
items.Add(1);
items.Add(DateTime.Now);

SortedLists and Hashtables are store key-value pairs. you can define a key for your items. this helps you to find them quickly. SortedLists are automatically sorted Hastables.

Hashtable items1 = new Hashtable();
items1.Add("item1", "abc");
items1.Add("item2", "dfg");

SortedList items2 = new SortedList();
items2.Add("Second", "dfg");
items2.Add("First", "abc");

Hope this helps !

Canavar
A: 

Two tips I can offer: 1. Use Generic collections as much as possible. 2. When deciding between a HashSet and a List generic collection, really look at what you are going to be using them for. Hashsets may be faster at searching, but the also slow down with inserts (I have found).

Mike_G
A: 

Algorithms and Data Structures. Each one has its advantages and disadvantages, and each one has its purpose.

Justice
A: 

there are lots of posts related to this issue, you must think WHAT do you really need to do. do you need a string based key(¿) how data is goint to be populated, do you need a native method to find if any key exist, or if any value exist(?)

Generics are the most used by me, but there is a reason for the others ;)

http://discuss.fogcreek.com/dotnetquestions/default.asp?cmd=show&amp;ixPost=5119

gonxalo
+1  A: 

Like so many other things in computer science, when there are multiple choices, it usually means there are multiple ways of doing something. As others have said, there are various advantages and disadvantages of each collection. Regardless of whether you're using the generic versions of the collections or not, ultimately all collections provide these operations:

  • insert
  • update
  • delete
  • lookup
  • enumeration

The different collections have different performance characteristics for each of these operations. For example, an array is quick to update an item, but takes longer to insert or delete an item. Lookup is very fast.

Compare that with a List. The List is very fast to insert. Lookup takes longer. Update and delete operations require that you have the item already and is pretty fast. Enumeration for both an array and a List are about the same.

All collections also have certain behaviors, for example, does the collection maintain sorted. If so, then the insert/update/delete operations will take longer but will speed up lookup.

So depending on what your program is doing most of the time will determine which collection to use.

Tommy Hui