views:

872

answers:

15

I'm interested in finding out what people would consider the most useful data structures to know in programming. What data structure do you find yourself using all the time?

Answers to this post should help new programmers interested in finding a useful data structure for their problem. Answers should probably include the data structure, information about it or a relevant link, the situation it is being used in and why it is a good choice for this problem (e.g ideal computation complexities, simplicity and understanding etc.)

Each answer should be about one data structure only.

Thanks for any pearls of wisdom and experience people can share.

+1  A: 

This post is way too vague. There are countless data structures: arrays, dictionaries, etc. Each data structure can be used to solve different problems.

It would be much more productive to ask for DS for a specific problem.

aku
+1  A: 

This is a bit like asking which tools in a carpenter's toolbox are best to learn to use. Each of them is good for a certain type of job, and you need to learn the basic ones (maps, lists, bags, sets, etc) equally.

skaffman
um... what's a 'bag'!?!!
kronoz
I think a bag is like a set except you can have multiple entries with the same value
Scottm
Also known as a multiset.
Greg Sexton
From Commons Collections: "[a bag] defines a collection that counts the number of times an object appears in the collection."http://commons.apache.org/collections/api-release/org/apache/commons/collections/Bag.html
skaffman
A: 

I don't think there is one datastructure one must know. Each datastructure has it's own properties, and thus suitable for a specific problem.

Ikke
+1  A: 

I find myself using associative array quite a lot, basically arrays with a string as the index.

paxdiablo
Associative array is more of an abstract data type, of which a hash table is one possible implementation.
Miles
+3  A: 

Linked lists / doubly-linked lists / other variants. Everyone should know the pros and cons of a linked list, and by the complete lack of usage, it seems to be something that many people seem to forget.

The advantages of linked lists are that they are very cheap to add/remove nodes. Unlike arrays, or data structures that use an array at the core, they do not require reallocating more memory upon expanding.

The disadvantages are that they do not perform well at all for searching. What would be a O(1) look-up in an array is O(n) for a linked list.

Like all structures, linked lists are ideal only under certain conditions. But used at the right time, they are very powerful.

Spodi
By searching do you mean random access?
Corey
they also require 2-3 times the memory of an array-backed list, and the speed benefit for adding/removing is negligible until you've hit a huge number (100,000+) of nodes ... but they do have very predictable growth patterns, so are useful in marginal memory situations
kdgregory
A: 

I've always found a myriad of uses for stacks, although less so in object oriented programming. Really, all data structures have their uses, and they're not complex. Learn all you can.

Nerdfest
A: 

I don't think that there is a general answer here. It should be bounded to some use case. For example in my more than 10 year career as a programmer/manager I have never used binary trees. I doubt that means that binary trees are not useful, but that in kernel and embedded world the linked list probably fits better.
Actually when I think about dropping a few exceptions I used only simple linked lists.
And then even in embedded it probably not the only structure used I'm living in the world of low level hardware protocols, probably "up the hill" more data structures used...

Ilya
+1  A: 

I like binary trees. Especially the Splay-Tree variant. It's somewhat similar to a self balancing binary treee but also adapt to the usage pattern of the application. You almost never run into worst case O(n) behaviour.

A nice bonus is that they are also easier to write and need less code than other self-balancing binary trees. It's one of my favorite data-structures because it performs so incredibly well in practice.

http://en.wikipedia.org/wiki/Splay_tree

Nils Pipenbrinck
Does "almost never" mean "no more than log(n)/n of the time"? ;-p
Steve Jessop
It depends on the usage pattern. I haven't done a O-analysis on that as it never occured in practice so far.
Nils Pipenbrinck
A: 

For a basic appreciation, you should know of a few abstract data types (set, dictionary, ordered list, queue, stack etc.) and several ways of implementing each with their relative trade-offs.

This will probably require you to understand arrays, linked-lists (single and double linked), hash tables, binary search trees (with some understanding of simple balancing heuristics) and binary heaps. Know these inside out and you'll be a long way towards understanding more complex and interesting data structures. Plus if you've implemented all of them you'll have a ready-made library that you understand for programming projects (although obviously more battle-hardened libraries like Boost or whatever are more appropriate for production code).

This gives a very useful vocabulary of data structures, which might make a significant difference to the way you write your programs. You might find you've been solving problems with many partial implementations of a queue, for example, that you can now replace with a canonical implementation.

HenryR
+4  A: 

One of the data structures I use the most (beyond vectors, of course) is the Hashtable. Its about the only choise if you need to be able to search large quantities of data in O(1) time, that means the time to search does not grow as the size of the collection grows.

The catch is that the insertion and deletion times are larger than in other data strutures, and you need to have some sort of key with which to search the collection. Every element must have a key. The algorithm takes the key of each element and computes an hash code that indicates the slot in the hash table in which to search. Then depending on the implementation it either follows a list of items that fell on that bucket to find your item or it searches nearby buckets. The size of the hastable is determinant to the efficiency of the hash that is quite affected by the ammount of collisions of hash codes between keys.

Use it whenever you need a map and the expected number of elements of the map exceed about 10. Its a bit more more memory intensive than other structures since it needs lots of unused slots in the table to be efficient.

C# has a great implementation of it with Dictionary<keytype, valuetype> and even has a HybridDictionary that decides internally when to use a hashtable or a vector. Any good programming book describes it but you will be well served by wikipedia: http://en.wikipedia.org/wiki/Hashtable

+1  A: 

I find myself using arrays very frequently in combination with the "foreach" control structure to loop through the items. In the past I used arrays with a numeric index and the "for(i=1;i<n;i++)". I've found that looping through arrays with "foreach" instead of an explicit numeric index provides a more general and readable solution.

sunseeker
+1  A: 

I will have to disregard your requirement about one data structure per post - these are the ones that i have used the most and most programs i find require mostly one amongst these or a combination.

arrays - the most basic and provides the fastest access. vectors are the improvisation over the plain old arrays and are de-facto replacements used commonly these days. dequeue is another variation on this theme and again provides consant time / random access but optimized for fast insertions and deletions at the beginning and end.

link list - very useful to maintain a list of data that is dropped and inserted frequently but very slow to iterate / search. eg free / used lists inside memory pages

trees - a basic structure that forms the basis of more complex structures. There are many forms of this structure. Provides logn search times when the tree is kept sorted.Becomes useful for large data items like dictionaries. Binary / AVL and red-black trees are the most common.

maps and hashes - Not exactly data structures but complex fast lookup algorithms implemented using a combination of clever logic and these above data structure.

These data structure and their implementaion are avalable in the STL library in C++. Other languages also have their native implementations. Once you know these basic data structures and a few of their variatons (queue,stack,priority queues) & something about search algorithms i would say the basics would be well covered.

+1  A: 

The advantages of linked lists are that they are very cheap to add/remove nodes. Unlike arrays [...] they do not require reallocating more memory upon expanding.

If you have an array and you double the allocation size every time you fill it, you'll have amortized O(1) appends. Also, looping over all the elements of an array is likely to be faster (in wall time) than looping over a linked list, due to caching effects (unless you allocate the links in big chunks and don't mess around with them too much).

Also, arrays are smaller: you save the per-element word overhead, plus the per-allocation overhead (which is probably at least two words: one for the size and one for the next-in-free-list pointer).

Jonas Kölker
A: 

Quicksort

Mergesort

Bubblesort

These are really good to learn and understand how they work. Sorting is fun and can be applied to many areas :)

Filip Ekberg
These are algorithms rather than data structures specifically.
Greg Sexton
Yes indeed however, they are often used on structures like lists..
Filip Ekberg
+1  A: 

Graphs are a very powerful overlooked data structure.

A lot of problems can be solved by constructing a graph modeling your problem, then using a well-known algorithm on the graph. Some examples natural language processing (the edge weight connecting nodes can represent how likely one word is to follow another) video games (use graphs to determine shortest paths for AI characters), and network topology.

I learned about graphs from The Algorithm Design Manual, which was recommended by Steve Yegge in a blog post.

RossFabricant