Important data structures in search

views:

255

answers:

+6 Q:

Important data structures in search

I'm interested in teaching myself different data structures, something I currently know very little about. My plan is to implement a few key structures so I understand how they work. I'm looking for suggestions on important data structures to start with.

I'm primarily interested in data structures that are relevant to search applications (e.g. Google / Lucene) and the general trade-off between delayed computation and precomputation. I'm also interested in distributed data structures -- data structures that can scale across hundreds / thousands of servers -- and probabilistic data structures -- data structures that help finding an approximate answer but do not need to always be correct.

Wikipedia has a list of data structures. I am currently considering:

Hash table
B+-Tree
R-Tree
KD-Tree
Radix-Tree
Bloom filter

Are there better choices?

Finally, is there any (major) problem with implementing these structures in a language like F#?

+5 A:

Very ambitious. I voted your question up just for its scope.

MIT has an on-line algorithms and data structures course. The companion book is a classic. I'm not sure if it addresses the distributed and probabilistic features, but they'll give you an excellent grounding in the fundamentals.

I'd add red-black tree, hash tables, patricia trie, and skip lists to your agenda.

Good luck.

duffymo 2009-12-30 15:05:21

+2 A:

Since you have very little knowledge on DS I think you should start with Lists (Single and doubly Linked Lists).

Then you can study various tree data structures.

Also since you are interested on DS related to search, I think you should study B-tree+ trees and hash table.

The Algorithm Design Manual is a good book to learn more about algorithms.

Upul 2009-12-30 15:05:55

+3 A:

For search, algorithms are more important than data structures. When searching a large search space you often have to have sophisticated methods for pruning the search space.

You might look at classic search algorithms such as alpha-beta, A*, AO*.

Then look at something like iteratively deepening search.

In search algorithms, things like stacks and linked lists (which are really a form of a stack) and trees are more important than hash tables, B-trees etc. Of course, you will undoubtedly have hash tables in there, but it won't be the heart of the algorithm.

Here's some more important search algorithsm:

B* search
backtracking
beam search
best-first search
bidirectional search
hill-climbing search
simulated annealing
IDA*
iterative deepening depth-first search
mini-max search
nearest neighbor search
spreading activation
state space search (not a technique, just a way of conceptualizing a problem).

As far as specific data structures for search goes, you really don't need any. Basically, you just need your regular tool kit of data structures - trees, hashes, lists.

Larry Watanabe 2009-12-30 15:16:26

I disagree that for search algorithms are more important that data structures. The two really go hand in hand.

Jason 2009-12-30 15:38:40

I think he's looking for information retrieval algorithms first, numeric optimization is more helpful once you already have the basics working.

Nathan Howell 2010-01-02 06:44:44

"More important" is perhaps a mis-statement. I should have said that there is more search-specific material to learn in the algorithm literature, than in data structures, because a relatively small number of data structures that are commonly used for other purposes will for the most part suffice, but there is a huge body of literature on different search algorithms.

Larry Watanabe 2010-01-10 17:56:09

+3 A:

If you are going to tackle this sort of thing with a functional language you should have a look at Purely Functional Data Structures by Chris Okasaki. Basic lesson is: the data structures you are familiar with for imperative programming may not be the best choices for functional programming. I expect there's a lot of similar material to be Googled for.

Regards

Mark

High Performance Mark 2009-12-30 15:24:43

ansaurus

tags:

views:

answers:

Important data structures in search

related questions