tags:

views:

89

answers:

3

I have a very simple sql select statement on a very large table, that is non-normalized. (Not my design at all, I'm just trying to optimize while simultaneously trying to convince the owners of a redesign)

Basically, the statement is like this:

SELECT FirstName, LastName, FullName, State
FROM Activity
WHERE (FirstName=@name OR LastName=@name OR FullName=@name)
AND State=@state;

Now, FirstName, LastName, FullName and State are all indexed as BTrees, but without prefix - the whole column is indexed. State column is a 2 letter state code.

What I'm finding is this:

  1. When @name = 'John Smith', and @state = '%' the search is really fast and yields results immediately.
  2. When @name = 'John Smith', and @state = 'FL' the search takes 5 minutes (and usually this means the web service times out...)
  3. When I remove the FirstName and LastName comparisons, and only use the FullName and State, both cases above work very quickly.
  4. When I replace FirstName, LastName, FullName, and State searches, but use LIKE for each search, it works fast for @name='John Smith%' and @state='%', but slow for @name='John Smith%' and @state='FL'
  5. When I search against 'John Sm%' and @state='FL' the search finds results immediately
  6. When I search against 'John Smi%' and @state='FL' the search takes 5 minutes.

Now, just to reiterate - the table is not normalized. The John Smith appears many many times, as do many other users, because there is no reference to some form of users/people table. I'm not sure how many times a single user may appear, but the table itself has 90 Million records. Again, not my design...

What I'm wondering is - though there are many many problems with this design, what is causing this specific problem.

My guess is that the index trees are just too large that it just takes a very long time traversing the them. (FirstName, LastName, FullName)

Anyway, I appreciate anyone's help with this. Like I said, I'm working on convincing them of a redesign, but in the meantime, if I someone could help me figure out what the exact problem is, that'd be fantastic.

Update

As requested, here are the details after running an Explain:

id: 1, select type: 'SIMPLE', table: 'activity', type: 'ref', possible keys: 'IDX_LastName,IDX_FirstName,IDX_FullName,IDX_State', key: 'IDX_State', key_len: '3', ref: 'const', rows: 7227364, extra: 'Using where'
+1  A: 

Just guessing without seeing your execution plan, but you could try to create compound indexes

(FirstName, State)
(LastName, State)
(FullName, State)

or even a single compound index:

(State, FirstName, LastName, FullName)
Peter Lang
A: 

I think you need to take some time with the EXPLAIN statement. Peter Lang is likely right that you need some complex indexes, but EXPLAIN will show what impact any change like that will have much better than by trial and error.

Oh, and use the keyword SQL_NO_CACHE in any select statement that you're trying to profile to prevent query caching from altering your results (just add it after the SELECT)

ircmaxell
+2  A: 

We need to know how the table is defined, so giving us the results of

show create table Activity

would be a help.

From the sound of it, there's something about the State column - is it indexed?

Another good tool is the EXPLAIN command.

use

EXPLAIN 
    SELECT 
        FirstName, LastName, FullName, State 
    FROM Activity 
    Where (FirstName='John Smith' 
           OR LastName='John Smith' 
           OR FullName='John Smith') 
    AND State='FL';

and compare the results with the EXPLAIN output from other variations of the query.

stephenr