Say I have a schema that represents a fixed-depth hierarchy like this:
CREATE TABLE level0 (
id INTEGER PRIMARY KEY AUTOINCREMENT,
text TEXT NOT NULL
)
CREATE TABLE level1 (
id INTEGER PRIMARY KEY AUTOINCREMENT,
text TEXT NOT NULL,
level0_id INTEGER NOT NULL
)
CREATE TABLE level2 (
id INTEGER PRIMARY KEY AUTOINCREMENT,
text TEXT NOT NULL,
level1_id INTEGER NOT NULL,
is_important INTEGER
)
CREATE INDEX level2_level1_id ON level2 (level1_id)
CREATE INDEX level1_level0_id ON level1 (level0_id)
(Just to give a sense of scale, assume 1000 rowsin level0, 2000 in level1, and 20000 in level2 and this is a sqlite database on an phone's sd card. Level 0 queries return up to 1000 rows, level1 queries return 1-30 rows, and level2 queries return 1-20 rows)
I'm displaying this hierarchy one level at a time. So my queries for displaying each of the three levels look like this:
SELECT id,text FROM level0
SELECT id,text FROM level1 WHERE level0_id = 12345
SELECT id,text FROM level2 WHERE level1_id = 23456
Simple, fast, and fully indexed. Now, I also want to display the same hierarchy, except I want to filter it based on is_important. I only want to display level0 and level1 rows that eventually lead to level2 rows with is_important = 1.
So I write some new queries, very different from the old ones.
level 0:
SELECT DISTINCT l0.id,l0.text
FROM level2 AS l2
INNER JOIN level1 AS l1 ON l1.id = l2.level1_id
INNER JOIN level0 as l0 on l0.id = l1.level0_id
WHERE l2.is_important = 1
level 1:
SELECT DISTINCT l1.id,l1.text
FROM level2 AS l2
INNER JOIN level1 AS l1 ON l1.id = l2.level1_id
WHERE l2.is_important = 1
level 2:
SELECT id,text FROM level2 WHERE level1_id = 23456 AND is_important = 1
The level 0 and level 1 queries are obviously much, much, slower than the unfiltered queries above. I understand why they are slow, but I'm having trouble improving their performance.
I feel strange starting the query by walking the largest table to extract the smaller ones, but this seems like the most succinct and natural way to express what I want in terms that SQL can understand.
So my question is this: How would you improve the performance of the filtered level 0 and level 1 queries above?