views:

165

answers:

1

Note: In this question I'm using the term "autocomplete" (or "iterative search") to refer to returning search-as-you-type results, e.g. like Google Search gives you. Also my question is not specific to web applications vs. fat client apps.

How are SQL SELECT queries normally constructed to provide decent performance for this type of query, especially over arbitrarily large data sets? In the case where the search will only query based on the first n characters (easiest case) am I still issuing a new SELECT result FROM sometable WHERE entry LIKE... on each keypress. Even with various forms of caching this seems like it might result in poor performance.

In cases where you want your search string to return results with prefix matches, substring matches, etc. it's an even more difficult problem. Looking at a case of searching a list of contacts, you might return results that match FirstName + LastName, LastName + FirstName, or any other substring.

+1  A: 

Searches like Google, Yahoo, etc. use full text indexes to generate a high performance list of key words.

If you're doing iterative searches on single word columns, you won't need full text indexes and keywords. You can use LIKE on the indexed columns themselves.

Since you're doing the search as they type, you're doing prefix only matching. Your indexed columns will still get normal performance with a LIKE clause and a wild card doing "prefix" searches.

SELECT last_name FROM users WHERE last_name LIKE 'Adam%'

If you need to search from the other end, you'll want a reverse index, but, luckily, people don't type backwards.

You will issue a new SELECT statement for each "iterative search" but on a timer. Only if they stop typing, do you issue another query. You'll limit the result set using LIMIT or TOP so that the query can complete as soon as it fills 10 records or so. Also, this way you're only sending 10 records over the wire.

SELECT last_name FROM users WHERE last_name LIKE 'Adam%' LIMIT 10

Of course, for best performance, last_name would be the primary index. An index allows the database to get the value without hitting the actual record. Primary indexes are often contiguous, which makes them even faster.

If by chance, you are searching on one column, but returning another, then use a compound index so that the database engine can still get the value from the index itself, without hitting the record.

SELECT first_name FROM users WHERE last_name LIKE 'Adam%' LIMIT 10

For the above query, the primary index would be (last_name, first_name).

The timer is the key to performance. You can tweak the timer to get the performance you desire.

Marcus Adams
@Marcus - This answer is terrific, thank you!
Howiecamp