tags:

views:

35

answers:

4

Say I have a schema that represents a fixed-depth hierarchy like this:


CREATE TABLE level0 (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    text TEXT NOT NULL
)
CREATE TABLE level1 (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    text TEXT NOT NULL,
    level0_id INTEGER NOT NULL
)
CREATE TABLE level2 (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    text TEXT NOT NULL,
    level1_id INTEGER NOT NULL,
    is_important INTEGER 
)

CREATE INDEX level2_level1_id ON level2 (level1_id)
CREATE INDEX level1_level0_id ON level1 (level0_id)

(Just to give a sense of scale, assume 1000 rowsin level0, 2000 in level1, and 20000 in level2 and this is a sqlite database on an phone's sd card. Level 0 queries return up to 1000 rows, level1 queries return 1-30 rows, and level2 queries return 1-20 rows)

I'm displaying this hierarchy one level at a time. So my queries for displaying each of the three levels look like this:


SELECT id,text FROM level0
SELECT id,text FROM level1 WHERE level0_id = 12345
SELECT id,text FROM level2 WHERE level1_id = 23456

Simple, fast, and fully indexed. Now, I also want to display the same hierarchy, except I want to filter it based on is_important. I only want to display level0 and level1 rows that eventually lead to level2 rows with is_important = 1.

So I write some new queries, very different from the old ones.


level 0:

SELECT DISTINCT l0.id,l0.text
FROM level2 AS l2
INNER JOIN level1 AS l1 ON l1.id = l2.level1_id
INNER JOIN level0 as l0 on l0.id = l1.level0_id
WHERE l2.is_important = 1

level 1:

SELECT DISTINCT l1.id,l1.text
FROM level2 AS l2
INNER JOIN level1 AS l1 ON l1.id = l2.level1_id
WHERE l2.is_important = 1

level 2:

SELECT id,text FROM level2 WHERE level1_id = 23456 AND is_important = 1

The level 0 and level 1 queries are obviously much, much, slower than the unfiltered queries above. I understand why they are slow, but I'm having trouble improving their performance.

I feel strange starting the query by walking the largest table to extract the smaller ones, but this seems like the most succinct and natural way to express what I want in terms that SQL can understand.

So my question is this: How would you improve the performance of the filtered level 0 and level 1 queries above?

A: 

I suggest taking a look at the plans for the two queries (filtered and unfiltered) to see why the unfiltered query is so slow. Purely guesswork, but if the only indexes are on the ID columns of each table then the database may be deciding to take a sequential walk through the level2 table to find those rows where IS_IMPORTANT = 1.

To try and affect this, try adding an index on level2(level1_id, is_important). That puts all the columns used in the WHERE clauses of the various queries into an index. It looks like that should help on the other queries as well.

Share and enjoy.

Bob Jarvis
A: 

A quick trick for inner joins: SMALL_TABLE INNER JOIN BIG_TABLE is faster than the reverse.

In your case, try adding your level2 table last.

MPelletier
Thanks. This sped it up quite a bit.
blucz
A: 

Have you tried changing

CREATE INDEX level2_level1_id ON level2 (level1_id)

to

CREATE INDEX level2_level1_id ON level2 (level1_id,is_important) ?

KMW
This helped, thanks.
blucz
A: 

I ended up arriving at a faster query that used a different technique and avoided the most expensive join. This is about 3x faster than the query I ended up with after applying all of the advice in this thread. Reordering the joins led me down the path to eventually eliminating one (and also gave the best performance gain by itself), so I've accepted that answer.

The query I'm going forward with for now is:


level 1:

SELECT l1.id,l1.text
FROM level1 AS l1
WHERE EXISTS 
(SELECT * FROM level2 AS l2 WHERE l2.level1_id = l1.id AND l2.is_important) 

The level0 query is a hybrid of the two approaches--I join on level0 and level1, but filter level2 using the nested query.

blucz