views:

524

answers:

3

Hello,

we've got a list of search-result mappings, e.g. a simple url mapping might look like

"stackoverflow" -> "www.stackoverflow.com" "joel" -> "www.joelonsoftware.com"

so searching for the exact phrases is working fine.

Now we're looking for an incremental search / typeahead, e.g. "stackover" would also return "www.stackoverflow.com". We could of course populate our maps accordingly, e.g. put every possible string into the map, starting with all variations of a given min size

-> map keys:

stack -> stackoverflow ... stackoverf -> stackoverflow stackoverfl -> stackoverflow stackoverflo -> stackoverflow stackoverflow -> stackoverflow

However that would mean a higher memory footprint that necessary (I guess).

Any suggestions?

+5  A: 

Simplest solution: Search in a List

You can also search on the fly, for example like this:

List<String> urls = Arrays.asList("this", "is", "a", "test");

// search for "is"
List<String> reduced = new ArrayList<String>();
String searchWord = "is";
for (String s : urls) {
    if (s.contains(searchWord)) {
         reduced.add(s);
    }
}

// when the user types more, search again using the already reduced list.

The first serach will be the slowest, but then you can use the already reduced list which should be a lot faster.

More sophisticated: use a Trie

If performance is an issue and you only allow searches that match the start of the strign (e.g. "stack" for "stackoverflow", but not "overflow" as the search term), you should look into representing the data as a Trie. This gives you O(c) search performance, where c is the number of characters. So the search performance is independent of the number of search terms, which is quite awesome.

Advanced Solution: Use a Suffix Tree

A Suffix tree is more or less an advanced Trie, here you can also search for any substrings in O(c), as for the Trie. I'd say this is the most advanced option.

martinus
A suffix tree is very similar to a kind of trie and the asymptotic search performance is the same (especially, it's *not* constant).
Konrad Rudolph
Well its O(1) when you limit the maximum number of characters, so I think you could say it is constant. At least it is independent of the number of elements.
martinus
A: 

Hello,

thanks, Trie was a good suggestion. I first thought it was a typo and you meant tree :)

here's a sample java implementation:

http://www.koders.com/java/fid0F06E53F2CFCC6E591C38752F355A7178F92FFE5.aspx

A: 

It may be overkill but I would check out Lucene for this problem.