views:

320

answers:

3

My table (projects):

id, lft, rgt
1, 1, 6
2, 2, 3
3, 4, 5
4, 7, 10
5, 8, 9
6, 11, 12
7, 13, 14

As you may have noticed, this is hierarchical data using the nested set model http://tr.im/EO3G. Tree pretty-printed:

1
 2
 3
4
 5
6
7

I want to select all sub projects under project 1 and 4. I can do this with:

SELECT p.id
FROM projects AS p, projects AS ps
WHERE (ps.id = 1 OR ps.id = 4)
AND p.lft BETWEEN ps.lft AND ps.rgt

However, this is very slow with a large table, when running EXPLAIN (Query) i get:

+----+-------------+-------+-------+------------------------+---------+---------+------+------+-------------------------------------------------+
| id | select_type | table | type  | possible_keys          | key     | key_len | ref  | rows | Extra                                           |
+----+-------------+-------+-------+------------------------+---------+---------+------+------+-------------------------------------------------+
|  1 | SIMPLE      | ps    | range | PRIMARY,lft,rgt,lftRgt | PRIMARY | 4       | NULL |    2 | Using where                                     | 
|  1 | SIMPLE      | p     | ALL   | lft,lftRgt             | NULL    | NULL    | NULL | 7040 | Range checked for each record (index map: 0x12) | 
+----+-------------+-------+-------+------------------------+---------+---------+------+------+-------------------------------------------------+

(The project table has indexes on lft, rgt, and lft-rgt. As you can see, mysql does not use any index, and loops through the 7040 records)

I have found that if I only select for one of the super project, mysql manages to use the indexes:

SELECT p.id
FROM projects AS p, projects AS ps
WHERE ps.id = 1
AND p.lft BETWEEN ps.lft AND ps.rgt

EXPLAINs to:

+----+-------------+-------+-------+------------------------+---------+---------+-------+------+-------------+
| id | select_type | table | type  | possible_keys          | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------+------------------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | ps    | const | PRIMARY,lft,rgt,lftRgt | PRIMARY | 4       | const |    1 |             | 
|  1 | SIMPLE      | p     | range | lft,lftRgt             | lft     | 4       | NULL  |    7 | Using where | 
+----+-------------+-------+-------+------------------------+---------+---------+-------+------+-------------+

FINALLY, my question: I there any way i can SELECT rows matching multiple ranges, and still benefit from indexes?

+1  A: 

have you tried a union? take your second example, add "union" underneath and the repeat but matching id 4. i don't know if it would work, but it seems like an obvious thing to try.

edit:

SELECT p.id
FROM projects AS p, projects AS ps
WHERE ps.id = 1
AND p.lft BETWEEN ps.lft AND ps.rgt
UNION
SELECT p.id
FROM projects AS p, projects AS ps
WHERE ps.id = 4
AND p.lft BETWEEN ps.lft AND ps.rgt
andrew cooke
+1  A: 

From 7.2.5.1. The Range Access Method for Single-Part Indexes in MySQL reference manual:

Currently, MySQL does not support merging multiple ranges for the range access method for spatial indexes. To work around this limitation, you can use a UNION with identical SELECT statements, except that you put each spatial predicate in a different SELECT.

So you need to have a union of two different selects.

Eemeli Kantola
:( Not the answer I was looking for, but at least i can stop wasting time trying to figure it out.
Joernsn
A: 

Your query does merge the multiple ranges.

It uses a range access method to combine the multiple ranges on p (which is leading in the join).

For each row returned from p, it checks the best method to retrieve all rows from ps for the given values of p.lft and p.rgt. Depending on the query selectivity, it may be either a fullscan over ps or a index lookup over one of two possible indexes.

The number of rows shown in the EXPLAIN means nothing: the EXPLAIN just shows the worst possible outcome. It doesn't necessarily mean that all these rows will be examined. Whether they will or not the optimizer can only tell in runtime.

The documentation snippet about the impossibility to merge the multiple ranges is only valid for SPATIAL indexes (R-Tree those that you create over GEOMETRY types). These indexes are good for the queries that search upwards (the ancestors of a given project) but not downwards.

A plain B-Tree index can combine the multiple ranges. From the documentation:

For all types of indexes, multiple range conditions combined with OR or AND form a range condition.

The real problem is that the optimizer in MySQL cannot make a single correct decision: either use a single fullscan (with ps leading), or make several range scans.

Say, you have 10,000 rows and your projects boundaries are 0-500 and 2000-2500. The optimizer will see that each boundary will benefit from the index, the range check will result in two range accesses, while a single fullscan would be better.

It may be even worse if your project boundaries are, say, 0-3000 and 5000-6000. In this case the optimizer will make two fullscans, while one would suffice.

To help the optimizer make the correct decision, you should make the covering index on (lft, id) in this order:

CREATE INDEX ix_lft_id ON projects (lft, id)

The tipping point for using the fullscan over a covering index rather than a range condition is 90%, that means you will never have more than a one fullscan in your actual plan.

Quassnoi