views:

62

answers:

3

I have a very large list of terms for use in an autocomplete box. I've been mulling over a few different scenarios for how to prune them down, but I haven't come up with anything great yet.

Basically, the structure is very similar to a record label -

  • An artist has albums An album has songs
  • Individual songs could be popular, albums are mostly sums of their underlying song popularity
  • Albums also have highly variable number of songs in them - so if an album has hundreds of song, it's very likely that someone would want to search for it, and much less likely if it has less songs
  • As the autocomplete becomes more specific (more letters), I'd like less likely terms to be shown

I'm thinking something like this:

Apple   10
Banana  10
Crab    20
Diner   30
Dish    20
Daily   10
Diver   20
Dice    10

If this is the list of albums, and the "score" i assign them, I simply pop choose the list based on the length of the list I'm showing (3 for example) and then by score - I hit "D" above, and "Diner", "Dish" and "Diver" show up, and then "i" and it becomes "Diner", "Dish" and "Diver".

Is there a particular algorithm that does this? Or an AJAX autocompleter built for this? I'm currently using Prototype/Scriptaculous but I can't seem to get it right.

+1  A: 

You could give the closure autocomplete a try.

Annie
+1  A: 

This is not an easy algorithm to implement, since you are trying to index a data structure in two ways - lexicographically and by popularity.

One way to do it might be to build a compressed trie of the songs, where at each node you store a pre-built list of the N most popular songs beginning with that prefix. This would take a lot of storage (O(NUM_SONGS * N)), but would allow fast lookup (O(PREFIX)).

Avi
This looks great and is probably the best I can expect. Any implementations of this?
aronchick
+2  A: 

I just posted a server-side autocomplete implementation on Google Code. The project includes a java library that can be integrated into existing applications and a standalone HTTP AJAX autocomplete server. Kick the tires!

Shilad Sen
This _is_ really cool - and solves a lot of my problems. Any chance that there's a Ruby version available?
aronchick
You should be able to download the jar file, run it as a standalone server, and connect to it from ruby. See the python example code on the homepage as a starting point for writing a simple ruby client.In the interest of full disclosure, I haven't extensively tested the server wrapper code.
Shilad Sen