views:

249

answers:

3

I have a table with at least a couple million rows and a schema of all integers that looks roughly like this:

start
stop
first_user_id
second_user_id

The rows get pulled using the following queries:

SELECT * 
  FROM tbl_name 
 WHERE stop >= M 
   AND first_user_id=N  
   AND second_user_id=N 
ORDER BY start ASC

SELECT * 
  FROM tbl_name 
 WHERE stop >= M 
   AND first_user_id=N 
ORDER BY start ASC

I cannot figure out the best indexes to speed up these queries. The problem seems to be the ORDER BY because when I take that out the queries are fast.

I've tried all different types of indexes using the standard index format:

ALTER TABLE tbl_name ADD INDEX index_name (index_col_1,index_col_2,...)

And none of them seem to speed up the queries. Does anyone have any idea what index would work? Also, should I be trying a different type of index? I can't guarantee the uniqueness of each row so I've avoided UNIQUE indexes.

Any guidance/help would be appreciated. Thanks!

Update: here are a list of the indexes, I didn't include this originally since I've taken a shotgun approach and added a ton of indexes looking for one that works:

start_index: [start, first_user_id, second_user_id]
stop_index: [stop, first_user_id, second_user_id]
F1_index: [first_user_id]
F2_index: [second_user_id]
F3_index: [another_id]
test_1_index: [first_user_id,stop,start]
test_2_index: [first_user_id,start,stop]
test_3_index: [start,stop,first_user_id,second_user_id]
test_4_index: [stop,first_user_id,second_user_id,start]
test_5_index: [stop,start]

And here is the EXPLAIN output.

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: listing
type: index_merge
possible_keys: stop_index,F1_index,F3_index,test_1_index,test_2_index,test_4_index,test_5_index
key: F1_index,F3_index
key_len: 5,5
ref: NULL
rows: 238
Extra: Using intersect(F1_index,F3_index); Using where; Using filesort

Update for posterity

We ended up completely re-evaluating how we were querying the table and chose these indexes:

index_select_1: [first_user_id,start,stop]
index_select_2: [first_user_id,second_user_id,start,stop]

and then we select on the table with queries like these:

SELECT * 
  FROM tbl_name 
 WHERE first_user_id=N
   AND start >= M 
ORDER BY start ASC

SELECT * 
  FROM tbl_name 
 WHERE first_user_id=N   
   AND second_user_id=N
   AND start >= M 
ORDER BY start ASC

Thanks to everyone that answered, you really helped me think through the problem.

A: 

Try to avoid using ranges (e.g. >, >=, <, <=) as the left most portion of a WHERE clause. MySQL is unable to use an index for any fields in the WHERE clause to the right of a range.

At first glance I would say to at least create an index on first_user_id,stop,second_user_id. Then specify the query accordingly:

select * from tbl_name where first_user_id=N and stop >= M and second_user_id=N

UPDATE: D'oh, so I completely contradicted myself in the above query - since incorporating second_user_id into the index is useless when specifying it in the WHERE after the stop "range", so let's try this again.

ALTER TABLE tbl_name ADD INDEX index_1 (first_user_id,stop) ALTER TABLE tbl_name ADD INDEX index_2 (first_user_id,second_user_id,stop)

malonso
In order to give an accurate answer I would need to know a lot more about: the table, the data, the acceptable delay for each pull, the available hardware, the expected growth, etc. As a rule of thumb, you will get the best performance using queries that have WHERE statements whose fields are all index (correctly). However, given the fact that it seems you always need to filter by a range of "stop" values, I'm not sure that you can create an index that will help with BOTH the stop range and the ORDER BY start.
malonso
I can totally change the order of the queries, that shouldn't be a problem. I'm still wondering if an index on (first_user_id,stop,second_user_id) would work for the ORDER BY which seems to be the part that's really slowing the query down.
Jaymon
@Jaymon, using my phone book example, would an index on Last_Name+Street_Name help any? for each distinct Last_Name, like "Jones", the streets would be sorted properly, however you are getting a range of Last_Name, so it would still need to reorder everything based on Street_Name anyways.
KM
First of all, take a look at the update I made to my answer b/c I goofed something up in the original. If the result-set is massive there is not really anything you can do b/c of the stop "range". Out of curiosity, please run these queries and post the results: "select count(start) from tbl_name where first_user_id=N and stop >= M;" and "select count(start) from tbl_name where first_user_id=N and second_user_id=N and stop >= M;"
malonso
"select count(start) from tbl_name where first_user_id=N and second_user_id=N and stop >= M;" has 30 rows. select count(start) from tbl_name where first_user_id=N and stop >= M;" has 4096 rows. There are about 4 million rows total in tbl_name.
Jaymon
Hmm, well the 30 seems trivial and you could always punt and do that w/the result-set outside of mysql. As for the larger results, maybe try a subquery like: "SELECT start,stop,first_user_id,second_user_id FROM (SELECT start,stop,first_user_id,second_user_id FROM tbl_name WHERE first_user_id=N and stop >= M ) as tbl_name_results order by start ASC". Not sure how much faster that will make it but thought I would throw it out there.
malonso
A: 

The strange thing is that your query only returns 238 rows (?)

Since you stated that the query is faster without the ORDER BY, may I suggest that you do the sort after the query ?
That may be quickest way to fix the problem.

Also, don't forget to remove unused indexes afterwards :)


edit

That's a wild guess (because I'm not sure mysql won't factorize the query to its current form), but you could try to do the following:

SELECT * FROM (
    SELECT * 
      FROM tbl_name 
     WHERE stop >= M 
       AND first_user_id=N 
    ) AS derived
ORDER BY start ASC
Romuald Brunet
I thought about sorting them after the fact but While there are only 238 rows for the one set of values I used, we can't guarantee there will be so few, and in the future there will most definitely be way more. That's why I'm trying to figure it out now.And yeah, all those indexes are just on my dev db. When I find a set of indexes that works those will be the only ones that get added to the live server.
Jaymon
But doesn't the mere existence of those indexes cause a problem for the optimizer? Or is that just something stuck in my head regarding Oracle that has nothing to do with MySQL?
MJB
+1  A: 

Could you make your sample table and EXPLAIN results match? Because, obviously it is not a same situation and we don't know if you maybe made a mistake in abstracting your real query only by looking at the provided EXPLAIN results. If you don't want to show too much of a structure then reverse it and create the quoted table structure and provide EXPLAIN result on that (maybe you will catch the problem that way).

Now one thing is certain - sorting is using filesort, which is bad.

To simplify (we'll come back to it) - compound indexes useful for sorting need to have the sort field in front.

Example idx(ID, Start)

ID      Start
1
        5
        8
        8
        10
        25
2
        3
        9
        10
        40
        41
        42
        42
...

In the above example the index is not of much help for sorting if you don't have where condition in which ID is limited to only one value.

But, this exception is important since you have single row selectivity on one or both id fields.

So from your indexes the only indexes that have start at the beginning are

start_index: [start, first_user_id, second_user_id]
test_3_index: [start,stop,first_user_id,second_user_id]

Mysql ignores the index

start_index: [start, first_user_id, second_user_id]

because it has better choices in terms of selectivity - it would need to do an index scan with this index and it has indexes that will allow it to do index intersect jumping directly to (unsorted) results. It expects better selectivity from the intersect and selectivity drives the planer.

Once the result is obtained mysql should realize that it could use another index to sort the results, but it seems that it can not see how cheap that would be.

So to help the planer you could create an index that will capitalize on your single value selectivity with index such as:

two_ids_with_sort: [first_user_id, second_user_id, start]

I assume that above would work very well on your second query where you have conditions on both id's giving you access to presorted start record pointers. The following query should do the same for the first query:

one_id_with_sort: [first_user_id, start]

Only if you end up with a lot of records in the result sets I would look into indexing it further.

There are two paths there a) adding the field stop to the end of the index b) creating two more similar indexes with stop instead of start (index intersect could be used there and wider range of queries could benefit from it)

But do test all of the above theories.

Couple of general suggestions

  • write your conditions in most selective manner first
  • when testing indexes start with single column indexes first and then expand to compound indexes (for example for sorting on start I would add index only on start)
  • too many indexes are not so good in mysql as the query planer is not able to quickly run through all the possible combinations and can not properly estimate costs of all the operations (so it cuts corners and the best index combination and plan might be left out)
  • therefore test indexes with USE INDEX (index1) FOR ORDER BY in your select to gauge a benefit for a certain index over planer, see more here (esp FORCE option; also - aim to leave only the useful indexes and see if planer will be able to utilize them then, if not, as a last resort only, force the indexes in your queries for which performance is crucial. keep in mind that this is a bad practice in terms of administration and design).
Unreason