tags:

views:

453

answers:

2

Search for a term on amazon.com, for example "stack overflow", and the search results come back very quickly.

On the left hand side of the window, there is a faceted search that shows in certain categories, the count of products that match that term.

You can then drill into those terms. For example, there are 1094 books that match the term, which is broken down into Computers & Internet (1003), Science, etc.

Given that the search for books covers the contents of some of those books, it strikes me that this is a very impressive feat.

How does amazon do this? Massive parallelization? eg each node knows about a few products?

Incidentally, I saw that "stack overflow" appears in the text of "Soul of a New Machine", a book I remember from 1981

A: 

Well, there is parallelization, but one of the things that everyone does on the backend of these types of things is run slow processes (like semantic parsing of book contents) and put a fast lookup on top of it. They literally are caching the search results in some large databases, such that all they have to do is db lookups on your search results. Perhaps I misunderstood the question, but it's similar to what Google does. You don't think their spiders scour the web for your sites when you enter in a search term, right?

Matt
+9  A: 

The short answer is, a lot of indexing. The longer answer is, a lot of indexing, a lot of redundancy, a lot of caching, and smart partitioning.

The real answer is -- read this book: http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html

(It's free, and it's very good).

SquareCog
Thanks for the book reference.
Renaud Bompuis
Same here, thanks for the reference
webclimber