tags:

views:

121

answers:

1

For a general autocomplete suggestion (lexicographically sorted keys) one could use tries or TSTs. However I would like to do autocomplete to return results based on popularity of results. So searching for "clinton" would show "bill clinton", "hillary clinton" and "chelsea clinton" in that order. I believe this can be done using ngram approach within Lucene. Are there any other ways to do it?

On a similar note, what's the backend to return a list of results for the tags autocomplete on stackoverflow? Is it lucene based?

A: 

Depends on

  1. how big your search space is
  2. If you want error correction
  3. The constant-ness of your search space
  4. Per-user customization needed
  5. Multi-language?

For in-memory situations you can use a Trie that stores hit count at the leaf.

Another important issue to deal here is how you are going to predict incomplete multiple words. e.g. Abraham => Lincoln or Thomas (father of user). N-Grams from which corpus will be better, Google or Class VI History Book?

It depends on the context. Hit count may be hits in the current application or global. e.g. You will run into the cold-boot problem when a new word comes into vogue, but it yet to receive sustantial hits to bubble up to the top of the list

Thoughts to get the discussion started... Thanks

DotDot
The assumptions are:i) search space is around 1 million wordsii) every word has a popularity metric associated with it so we know bill clinton has say popularity of 100 and hillary has popularity of 85iii) words are in the latin language space, english, spanish, french etc.
Having a trie that stores hit counts would still require sorting in memory at run time since the trie nodes are accessed in a random order. What would be great is one could have the advantages of a trie AND being able to access the nodes in a sorted fashion. Eg: if there were a 1000 clintons, we don't want to retrieve all of them in memory and sort them to get the top 10. We would ideally like to traverse just 10 of the leaf nodes which would have the top 10 people with the name clinton.
well you can use a trie to find the matches and run a custom sort. Otherwise you can implement a custom SortedTrie that returns matches sorted by default.
DotDot