views:

53

answers:

3

I've trying to optimize a query on MySQL that takes around 15-20 seconds to run. My Data table has about 10M rows, and the query is trying to return 68,000 records which match 144 "run" fields and 35 "name" fields. Because the query is using two in clauses, my indexes don't seem to be terribly helpful.

Here's the query:

select * from data d where 
d.data_type='Result' and 
(d.run in ('8a7aee1f2a6232b1012a624da9201b92', '8a7aee1f2a6232b1012a625432a314ef' , 

... [144 runs]

)) and (d.name like 'itema[%]' or d.name like 'itemb[%]')

Here's the table definition

CREATE TABLE `data` (
  `data_type` varchar(31) NOT NULL,
  `id` char(32) NOT NULL,
  `entry_time` datetime default NULL,
  `name` varchar(255) NOT NULL,
  `step` int(11) default NULL,
  `value` double NOT NULL,
  `run` char(32) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `FK2EEFAA8ECCC6F3` (`run`),
  KEY `data2` (`run`,`step`),
  KEY `data3` (`data_type`,`name(10)`,`run`),
  CONSTRAINT `FK2EEFAA8ECCC6F3` FOREIGN KEY (`run`) REFERENCES `run_archive` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Explain tells me the query is using key data3.

id      select_type     table   type    possible_keys   key     key_len ref    rows    Extra
1       SIMPLE  d    range   FK2EEFAA8ECCC6F3,data2,data3    data3    223     NULL    113271  Using where

I used to run 144 queries (one for each run). It seems about twice as fast to do one query, but still way too slow.

Suggestions for optimizing? Ideas I have are:

  • Finding a magic index that speeds
    this up

  • Denormalizing data (it'd be easy to get rid of the run, but
    harder for the name)

  • Splitting up the data among different tables (hard to do with my Java/Hibernate approach)

Or am I just asking the impossible here?

Edit: it turned out the biggest fix was to increase the size of my innodb_buffer_pool. The query went down to about 1 and a half seconds after doing this. I've marked as "answer" a fix that improved it slightly more.

+1  A: 

Consider splitting result records away from the data table? I didn't catch what percentage your result is, but perhaps it's worth benchmarking in a Dev copy of your Prod database.

Can you FK those run values? If they're reusable(?), perhaps create a Run table? My guesstimate is that 144 string matches, even indexed, is slower that if they were int or smallint. Again, benchmarking this suggestion, or any suggestion, will obviously prove the theory.

What does the query plan difference look like when not including your like clause on the name attribute?

p.campbell
Had the data3 key wrong. (I'd a handedit). The first field is the data_type.
Will Glass
About 80% of the data is "result" data_type. Also, I do have a Run table-- the "run" field is foreign keyed to the run table, but via the char(32) id. Doing a join seems to be about the same performance. I use a char(32) instead of integer for portability of data (if I need to move it between servers).
Will Glass
replacing the name "like" with an in clause that spells out the 36 possibilities has about the same performance.
Will Glass
A: 

The create statement doesn't work for me - can you post the explain output? And adding indices on data_type and name should help.

Alternatively, create a view with the 'd.run in...' clause and run your queries against that, if the values for run are fixed.

Thilo
posted explain output
Will Glass
A: 

Depending on how selective the condition on run is, it might be better to provide the index

data_type, run, name(10)

The trouble with providing the column used for prefix matching early in the index is that it scatters matching rows across the index, requiring a bigger part of the index to be read from disk.

Also, using a smaller datatype for the id of run will reduce index size and speed up comparisions. This is a constant factor improvement, but might be worthwhile regardless.

meriton
this helps. running on my laptop it's 78 seconds for this index vs 135 for the other index.
Will Glass