views:

1525

answers:

2

I'm hitting some quite major performances issues due to the use of "ORDER BY"-statements in my SQL-code.

Everything is fine as long as I'm not using ORDER BY-statements in the SQL. However, once I introduce ORDER BY:s in the SQL code everything slows down dramatically due to the lack of correct indexing. One would assume that fixing this would be trivial, but judging from forum discussions, etc this seems to be a rather common issue that I've yet to see a definitive and concise answer to this question.

Question: Given the following table ...

CREATE TABLE values_table (
  id int(11) NOT NULL auto_increment,
  ...
  value1 int(10) unsigned NOT NULL default '0',
  value2 int(11) NOT NULL default '0',
  PRIMARY KEY  (id),
  KEY value1 (value1),
  KEY value2 (value2),
) ENGINE=MyISAM AUTO_INCREMENT=2364641 DEFAULT CHARSET=utf8;

... how do I create indexes that will be used when querying the table for a value1-range while sorting on the value of value2?

Currently, the fetching is OK when NOT using the ORDER BY clause.

See the following EXPLAIN QUERY output:

OK, when NOT using ORDER BY:

EXPLAIN select ... from values_table this_ where this_.value1 between 12345678 and 12349999 limit 10;

+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
|  1 | SIMPLE      | this_ | range | value1        | value1   | 4       | NULL | 3303 | Using where |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
However, when using ORDER BY I get "Using filesort":

EXPLAIN select ... from values_table this_ where this_.value1 between 12345678 and 12349999 order by this_.value2 asc limit 10;

+----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                       |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | this_ | range | value1        | value1   | 4       | NULL | 3303 | Using where; Using filesort |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------------+

Some additional information about the table content:

SELECT MIN(value1), MAX(value1) FROM values_table;
+---------------+---------------+
| MIN(value1)   | MAX(value2)   |
+---------------+---------------+
|             0 |    4294967295 |
+---------------+---------------+

...

SELECT MIN(value2), MAX(value2) FROM values_table;
+---------------+---------------+
| MIN(value2)   | MAX(value2)   |
+---------------+---------------+
|             1 |        953359 |
+---------------+---------------+

Please let me know if any further information is needed to answer the question.

Thanks a lot in advance!

Update #1: Adding a new composite index (ALTER TABLE values_table ADD INDEX (value1, value2);) does not solve the problem. You'll still get "Using filesort" after adding such an index.

Update #2: A constraint that I did not mention in my question is that I'd rather change the structure of the table (say adding indexes, etc.) than changing the SQL queries used. The SQL queries are auto-generated using Hibernate, so consider those more or less fixed.

A: 

It appears to me that you have two totally independent keys, one for value1 and one for value2.

So when you use the value1 key to retrieve, the records aren't necessarily returned in order of value2, so they have to be sorted. This is still better than a full table scan since you're only sorting the records that satisfy your "where value1" clause.

I think (if this is possible in MySQL), a composite key on (value1,value2) would solve this.

Try:

CREATE TABLE values_table (
    id int(11) NOT NULL auto_increment,
    ...
    value1 int(10) unsigned NOT NULL default '0',
    value2 int(11) NOT NULL default '0',
    PRIMARY KEY  (id),
    KEY value1 (value1),
    KEY value1and2 (value1,value2),
) ENGINE=MyISAM AUTO_INCREMENT=2364641 DEFAULT CHARSET=utf8;

(or the equivalent ALTER TABLE), assuming that's the correct syntax in MySQL for a composite key.

In all databases I know (and I have to admit MySQL isn't one of them), that would cause the DB engine to select the value1and2 key for retrieving the rows and they would already be sorted in value2-within-value1 order, so wouldn't need a file sort.

You can still keep the value2 key if you need it.

paxdiablo
Hi, thanks for your quick reply. I've tried your suggested solution, and unfortunately it did not work. I've added a clarification to my question.
knorv
No probs, looks like @Quassnoi has more knowledge of MySQL so I'll leave you to it. His explanation of why sorting is required for ranged value1's is something I didn't pick up from the question - DB2 would have similar probs. Remarking as community-wiki so no-one else makes the same mistake.
paxdiablo
There's some sort of SO bug preventing me from marking the question community-wiki.
paxdiablo
Weird, you can't just edit to make community-wiki, you have to change the answer as well - this is new functionality.
paxdiablo
+8  A: 

You cannot use an index in this case, as you use a RANGE filtering condition.

If you'd use something like:

SELECT  *
FROM    values_table this_
WHERE   this_.value1 = @value
ORDER BY
        value2
LIMIT 10

, then creating a composite index on (VALUE1, VALUE2) would be used both for filtering and for ordering.

But you use a ranged condition, that's why you'll need to perform ordering anyway.

Your composite index will look like this:

value1 value2
-----  ------
1      10
1      20
1      30
1      40
1      50
1      60
2      10
2      20
2      30
3      10
3      20
3      30
3      40

, and if you select 1 and 2 in value1, you still don't get a whole sorted set of value2.

If your index on value2 is not very selective (i. e. there are not many DISTINCT value2 in the table), you could try:

CREATE INDEX ix_table_value2_value1 ON mytable (value2, value1)

/* Note the order, it's important */    

SELECT  *
FROM    (
        SELECT  DISTINCT value2
        FROM    mytable
        ORDER BY
                value2
        ) q,
        mytable m
WHERE   m.value2 >= q.value2
        AND m.value2 <= q.value2
        AND m.value1 BETWEEN 13123123 AND 123123123

This is called a SKIP SCAN access method. MySQL does not support it directly, but it can be emulated like this.

The RANGE access will be used in this case, but probably you won't get any performance benefit unless DISTINCT value2 comprise less than about 1% of rows.

Note usage of:

m.value2 >= q.value2
AND m.value2 <= q.value2

instead of

m.value2 = q.value2

This makes MySQL perform RANGE checking on each loop.

Quassnoi
+1 for picking up the range problem I missed :-)
paxdiablo
Thanks for your comprehensive answer. Assuming that I'm unable to change the SQL queries used (those are auto-generated by Hibernate), do you believe that is this impossible to solve (by adding say better indexing)?
knorv
Another question: If the range querying is the problem, how come everything seems OK when not using ORDER BY? Sorry if I missed out on this detail.
knorv
Because the index is still used to filtering, as shown in you second EXPLAIN. In your query, it's ORDER BY that doesn't use the index, not WHERE.
Quassnoi
I don't think you can improve the performance without rewriting the query. You even cannot create a view, as it would use a nested subquery which MySQL does not allow in a view.
Quassnoi
I don't have a deep knowledge of Hibernate, but I think it's possible to use custom SQL: http://www.hibernate.org/hib_docs/reference/en/html/querysql.html. Again, this is just something I googled for, I didn't try it in practice.
Quassnoi