ansaurus

Question

What indexes can be used to improve this query?

Answer 1

A:

no, you have all indices there that you need:

on your id
on your date

knittl 2009-09-11 05:50:38

Answer 2

A:

Why doesn't the query with the date clause not use the accessid index?

Because using the date index is more efficient. That's because it's likely to pare the search space down faster.

At least one DBMS (DB2/z, I don't know much about MySQL) would benefit from an index on date+accessid since the access IDs would be sorted within dates in that index. That DBMS will use the date+accessid key to efficiently use the where clause to whittle down the search space and to return distinct values of accessid within that space.

Whether MySQL is that smart, I have no idea. My suggestion would be to try it and see (which is the best answer to most DB optimization questions).

paxdiablo 2009-09-11 05:53:02

Answer 3

A:

The query uses the 'date' index because thats what you use in the where clause.

This is the only sensible option, if it used the access id index it would need to read all the accessid rows then check the date before it and only then decide if it was distinct.

If this is a really big table a compound index on date and accessid might help.

James Anderson 2009-09-11 05:56:17

Answer 4

A:

I have no way of testing it, but I would definitely try to add an index spanning both accessid and date.

Index optimizations if often like alchemy. Different DBMS behave differently, and sometimes you need to simply try (and fail) various combinations. I’m not saying it’s not possible to reason. It is in many cases, but up to a certain point. Often it’s simply faster and easier to follow your instinct.

Jan Zich 2009-09-11 05:59:50

Answer 5

+2 A:

Your problem is that your condition is a range clause (on the date column).

A multi-column index of date->accessid likely wont help the situation as MySQL can't use index columns after a range condition. In theory they should be able to use it to cover the computation in this case, but it appears to be a shortcoming in MySQL, I've never gotten it to use a multi column index in this situation successfully.

You can try creating an index on (date,accessid) hoping that it will use it to cover the query (so you won't need to hit any tables), but I don't hold much hope. There's not a great deal you can do.

Edit:

My answer is courtesy of High Performance MySQL - Second Edition, worth it's weight in gold if you have to do serious MySQL development.

Michael 2009-09-11 06:00:20

Yep, adding the date+accessid index had no effect. Bummer.

Ian 2009-09-11 07:48:56

Answer 6

A:

Why doesn't the query with the date clause not use the accessid index?

Because using the date index allows it to ignore a large part of the data in the table. The chances are that the table holds mostly historical data, and a lot of it refers to dates a lot longer ago than the beginning of the current month, so the date criterion is selective and reduces the workload for the optimizer by allowing it to ignore most of the data.

If it used the accessid index, it would also have to read each row (as well as each index entry) to see whether the date meets the search criterion. This means reading the whole of the index and the whole of the table - in fact, it would do better in the context to ignore the index, but I started of with "if it used the accessid index".

Are there any other indexes I can use to speed up queries for distinct accessid's in certain date spans?

Depending on the sophistication of the optimizer, an index on (date, accessid) might improve things. It can do range searches on the leading column of the index, and the trailing column means that it does not have to refer to the data in the table to establish the accessid - the information is in the index. So, this might convert a query that access an index and a table into one that only accesses the index - which will reduce the amount of I/O needed and therefore improve the performance of the query.

If you have other criteria that need data from other columns, or you need to return more than just the unique accessid values, then you end up reading part of the table data; this is probably still a win compared with scanning the whole of the table.

Jonathan Leffler 2009-09-11 06:01:40

Answer 7

+4 A:

An index on (date,accessid) could help. However, before tweaking indices I'd recommend checking the type of your accessid column. EXPLAIN says the key is 257 bytes long, which sounds like a lot for an ID column. Are you using a VARCHAR(256) for accessid? If so, can't you use a more compact type? If it's a number, it should by INT (SMALLINT, BIGINT, whatever fits your needs) and if it's an alphanumeric ID, can it really be 256 chars long? If its length is fixed, can't you use CHAR (CHAR(32) for example) instead?

Josh Davis 2009-09-11 06:08:07

Ah, good call. The accessid we use is a 32 character string, for example 94a1d523fac45e589e7d8884332fa9b3. I'll reduce the size of this field.

Ian 2009-09-11 06:46:58

Making this change improved performance by about 75%.

Ian 2009-09-11 07:31:42

Glad it worked for you ;)

Josh Davis 2009-09-12 02:19:42

ansaurus

tags:

views:

answers:

What indexes can be used to improve this query?

related questions