views:

114

answers:

1

Hi!

I am having a lot of trouble finding a string matching algorithm that fits my requirements.

I have a very large database of strings in an unabbreviated form that need to be matched to an arbitrary abbreviation. A string that is an actual substring with no letters between its characters should also match, and with a higher score.

Example: if the word to be matched within was "download" and I searched "down", "ownl", and then "dl", I would get the highest matching score for "down", followed by "ownl" and then "dl".

The algorithm would have to be optimized for speed and a large number of strings to be searched through, and should allow me to pull back a list of matching items strings (if I had added both "download" and "upload" to the database, searching "load" should return both). Memory is still important, but not as important as speed.

Any ideas? I've done a bunch of research on some of these algorithms but I haven't found any that even touch abbreviations, let alone with all these conditions!

A: 

I'd wonder if Peter Norvig's spell checker could be adapted in some way for this problem.

It's a stretch that I haven't begun to work out, but it's such an elegant solution that it's worth knowing about.

duffymo