views:

121

answers:

2

My application will offer a list of suggestions for English names that "sound like" a given typed name.

The query will need to be optimized and return results as quick as possible. Which option would be most optimal for returning results quickly. (Or your own suggestion if you have one)

A. Generate the Soundex Hash and store it in the "Names" table then do something like the following: (This saves generating the soundex hash for at least every row in my db per query right?)

select name from names where NameSoundex = Soundex('Ann')

B. Use the Difference function (This must generate the soundex for every name in the table?)

select name from names where Difference(name, 'Ann') >= 3

C. Simple comparison

select name from names where Soundex(name) = Soundex('Ann')

  • Option A seems like to me it would be the fastest to return results because it only generates the Soundex for one string then compares to an indexed column "NameSoundex"

  • Option B should give more results than option A because the name does not have to be an exact match of the soundex, but could be slower

  • Assuming my table could contain millions of rows, what would yield the best results?

A: 

you could pre-compute the DIFFERENCE() of all of your names and store them in a table like:

Differences
Name1
Name2
Difference


INSERT INTO Differences
        (Name1,Name2,Difference)
    SELECT
        n1.Name,n2.Name,DIFFERENCE(n1.Name,n2.Name)
        FROM Names           n1
            CROSS JOIN Names n2
        WHERE DIFFERENCE(n1.Name,n2.Name)<??? --to put a cap on what to store

and if the user enters one of your existing names you have the difference very quickly. If the user enters in a name that is not in your Names table, you can do your Option A or B. You could even give them a choice of "difference" in a select list. Zero would be your option A, and any value would use option B, first trying using the Differences table then the brute force table scan WHERE DIFFERENCE(@givenName,Names.Name)<@UserSelectLevel

KM
A: 

Option A will be the fastest.

Thanks Lieven

xkingpin