ansaurus

Question

Suppose I have 400 rows of people's names in a database. What's the best way to do a search for their names?

Answer 1

+9 A:

You can use:

SELECT * 
 FROM mytable 
WHERE name LIKE '%matt%'

Aaron 2010-03-07 01:57:55

Isn't this very slow...

TIMEX 2010-03-07 01:58:29

not for 400 records (or event for 40000 for that matter) start worry in the hundred of thousands...besides there is no better simpler option (I guess you can index everything and every word in a tree index :-DDD )

Itay Moav 2010-03-07 01:59:38

@alex: Wildcarding the left side of the text you are looking for will mean that an index, if one exists on the column, can't be used.

OMG Ponies 2010-03-07 02:00:08

@OMG, I would expect that any decent SQL database can select table scan 400 or ever 400,000 rows in less time that it takes to print out the results.

Paul Tomblin 2010-03-07 02:02:30

If you're worried about it, create an index on that column. I don't know if that will help much, but it's worth a try. See if it makes things faster, if they truly are slow. However, I wouldn't worry about it unless that query actually proves to be slow in your tests. As has been mentioned by others, with 400 rows, it is very unlikely to be noticeable.

Aaron 2010-03-07 02:04:35

Be aware too that just because an index exists, doesn't mean the optimizer will use it.

OMG Ponies 2010-03-07 02:05:38

@OMG Ponies he has 400 records. B.t.w does you nick comes from http://lfgcomic.com/page/42 and no, he didn't meant to address me. I wrote exactly what he (@ Paul Tomblin) wrote. I just forgot to put the @alex at the beginning.

Itay Moav 2010-03-07 02:06:20

@OMG, I though you were the one expressing concern about the fact that the indexes wouldn't be used in the LIKE query. I think @Itay was postulating something a lot more complicated than a LIKE query.

Paul Tomblin 2010-03-07 02:06:45

Could always try with explain (assuming that that functionality is available in his DBMS) to see if the index is getting used.

Aaron 2010-03-07 02:09:17

@ Paul Tomblin no no, I meant exactly what you meant. And suggested that the next step to increase performance of such a search is not an easy (and worth while until you have millions of users and records) or worth your time to implement.

Itay Moav 2010-03-07 02:10:31

@Itay Moav: Sorry, never seen the webcomic before. I choose the nick out of internet humour sake.

OMG Ponies 2010-03-07 02:18:06

@Paul Tomblin: I strive to provide answers that scale well regardless of record count. And be aware that we only have *assumptions* on the scope of searching that needs to take place.

OMG Ponies 2010-03-07 02:25:40

Answer 2

+1 A:

You have the following options:

Full Text Search (FTS)
Regular Expressions
LIKE Using wildcards

...in that order of preference.

OMG Ponies 2010-03-07 02:02:00

Answer 3

+11 A:

SELECT * 
FROM mytable 
WHERE name LIKE 'matt%' OR name LIKE '[ ,-/]matt%'

Notes:
1) Fancy wildcard. The reason for not using the simpler LIKE '%xyz%' form is that depending on the xyz the database could return many non-relevant records. For example "Jeff Zermatt" in the case of the "Matt" search.
The brackets in the second wildcard key include all the delimiters which may be indicative of a break between words. An alternative wildcard pattern would be [^A-Z0-9] (Which may yield a few O'Brian when search for brian but maybe not a bad thing...)

2) Performance. Because there are so few records in this table, the front wildcard approach is quite feasible, and certainly the easiest approach. No reason to search any further!
If the records happen to be very wide (many fields some of them more than 30 chars in length), you can create an index on name. The front-end wildcard will still require a scan, but this will be on the index which is narrower, hence fits more readily in the cache etc.
Indeed if rather than a SELECT * this query targets only a few of the fields of the myTable table [and if this table's record are "wide"], you can create a index made of all these fields.
Would the number of records grow past, say, 50,000 (and, to a lesser degree, would the application "hit" the database with similar queries at a rate above say 40 per minute), you may consider introducing more efficient ways of dealing with keywords: Full Text Catalog or a "hand made" table with the individual keywords.

3) Advantages of another approach. The advantage of a solution whereby the application maintains a table with a list of the individual keywords, readily parsed, from the full name, doesn't only provide better scaling (when the table and/or usage grows), but also introduces improvements in the quality of the search.
For example, it may allow improving the effective recall by introducing common common nicknames of first names (Bill or Will or Billy for William, Dick for Richard, Jack or Johnny for John etc.). Another possibility open by a more sophisticated approach is the introduction of a Soundex or modified Soundex encoding of the name tokens, allowing the users to locates names even when they may mispell or ignore the precise spelling (eg. Wilmson vs. Wilmsen vs. Willmsonn etc.)

mjv 2010-03-07 02:29:25

Aarrgghh soundex considered sophisticated aarrgghh bad luck about Wilmson/Wilson/Nilsen/Milson or Wilhelmsen/Vilhelmsen

John Machin 2010-03-07 07:04:54

Answer 4

A:

If you are trying to search for the names through any development Language, you can use the Regular expression package in Java. Some thing like java.util.regex.*;

harigm 2010-03-07 07:13:35

ansaurus

tags:

views:

answers:

Suppose I have 400 rows of people's names in a database. What's the best way to do a search for their names?

related questions