views:

130

answers:

4

I have two possible queries, both giving the result set I want.

Query one takes about 30ms, but 150ms to fetch the data from the database.

SELECT 
    id
FROM 
    featurevalues as featval3
WHERE 
    featval3.feature IN (?,?,?,?) 
AND
    EXISTS
    (
        SELECT
            1
        FROM
            product_to_value,
            product_to_value as prod2,
            features,
            featurevalues
        WHERE
            product_to_value.feature = features.int
        AND
            product_to_value.value = featurevalues.id
        AND
            features.id = ? 
        AND
            featurevalues.id IN (?,?)
        AND
            product_to_value.product = prod2.product
        AND
            prod2.value = featval3.id
    )

Query two takes about 3ms -this is the one I therefore prefer-, but also takes 170ms to fetch the data.

SELECT 
    (
        SELECT
            prod2.value
        FROM
            product_to_value,
            product_to_value as prod2,
            features,
            featurevalues
        WHERE
            product_to_value.feature = features.int
        AND
            product_to_value.value = featurevalues.id
        AND
            features.id = ? 
        AND
            featurevalues.id IN (?,?)
        AND
            product_to_value.product = prod2.product
        AND
            prod2.value = featval3.id
    ) as id
FROM 
    featurevalues as featval3
WHERE 
    featval3.feature IN (?,?,?,?) 

The 170ms seems to be related to the number of rows from table featval3. After an index is used on featval3.feature IN (?,?,?,?), 151 items "remain" in featval3.

Is there something obvious I am missing regarding the slow fetching? As far as I know everything is properly indexed.. I am confused because the second query only takes a blazing 3ms to run.

Update

This is what i get when I'm running an EXPLAIN on the second query:

0 Trace 0 0 0 00 
1 Variable 4 1 4 00 
2 Goto 0 88 0 00 
3 OpenRead 5 6883 0 00 
4 If 6 16 0 00 
5 Integer 1 6 0 00 
6 OpenEphemeral 7 1 0 00 
7 Null 0 8 0 00 
8 MakeRecord 1 1 8 00 
9 IdxInsert 7 8 0 00 
10 MakeRecord 2 1 8 00 
11 IdxInsert 7 8 0 00 
12 MakeRecord 3 1 8 00 
13 IdxInsert 7 8 0 00 
14 MakeRecord 4 1 8 00 
15 IdxInsert 7 8 0 00 
16 Rewind 7 86 0 00 
17 Column 7 0 5 00 
18 IsNull 5 85 0 00 
19 Affinity 5 1 0 00 
20 SeekGe 5 85 5 00 
21 IdxGE 5 85 5 01 
22 Null 0 10 0 00 
23 Integer 1 11 0 00 
24 MustBeInt 11 0 0 00 
25 IfZero 11 82 0 00 
26 Variable 1 12 3 00 
27 OpenRead 2 25 0 00 
28 OpenRead 8 7005 0 00 
29 OpenRead 9 26 0 00 
30 OpenRead 10 6732 0 00 
31 OpenRead 11 6766 0 00 
32 Column 5 1 15 00 
33 IsNull 15 77 0 00 
34 Affinity 15 1 0 00 
35 SeekGe 8 77 15 00 
36 IdxGE 8 77 15 01 
37 IdxRowid 8 8 0 00 
38 Seek 2 8 0 00 
39 Column 2 0 16 00 
40 IsNull 16 76 0 00 
41 Affinity 16 1 0 00 
42 SeekGe 9 76 16 00 
43 IdxGE 9 76 16 01 
44 Column 9 1 17 00 
45 IsNull 17 75 0 00 
46 SCopy 12 18 0 00 
47 IsNull 18 75 0 00 
48 Affinity 17 2 0 00 
49 SeekGe 10 75 17 00 
50 IdxGE 10 75 17 01 
51 If 20 59 0 00 
52 Integer 1 20 0 00 
53 OpenEphemeral 13 1 0 00 
54 Null 0 21 0 00 
55 MakeRecord 13 1 21 00 
56 IdxInsert 13 21 0 00 
57 MakeRecord 14 1 21 00 
58 IdxInsert 13 21 0 00 
59 Rewind 13 74 0 00 
60 Column 13 0 19 00 
61 IsNull 19 73 0 00 
62 Affinity 19 1 0 00 
63 SeekGe 11 73 19 00 
64 IdxGE 11 73 19 01 
65 Column 9 2 21 00 
66 Column 11 0 7 00 
67 Ne 7 72 21 6a 
68 Column 8 0 22 00 
69 Move 22 10 1 00 
70 AddImm 11 -1 0 00 
71 IfZero 11 77 0 00 
72 Next 11 64 0 00 
73 Next 13 60 0 00 
74 Next 10 50 0 00 
75 Next 9 43 0 00 
76 Next 8 36 0 00 
77 Close 2 0 0 00 
78 Close 8 0 0 00 
79 Close 9 0 0 00 
80 Close 10 0 0 00 
81 Close 11 0 0 00 
82 SCopy 10 9 0 00 
83 ResultRow 9 1 0 00 
84 Next 5 21 0 00 
85 Next 7 17 0 00 
86 Close 5 0 0 00 
87 Halt 0 0 0 00 
88 Transaction 0 0 0 00 
89 VerifyCookie 0 319 0 00 
90 TableLock 0 14 0 00 
91 TableLock 0 25 0 00 
92 TableLock 0 11 0 00 
93 Goto 0 3 0 00

Not sure what it means however.

+4  A: 

I'm not sure about sqlite, but most databases do not calculate the entire result set right away when you execute a query. They compile and optimize the query, then begin running it by fetching the first few rows.

Sort of like how it is quick to open a large file, but then it takes a long time to read all of the contents.

Harold L
+2  A: 

Why not take a look at the query plan and determine the answer yourself? Here's a reference: http://stackoverflow.com/questions/1454188/how-can-i-analyse-a-sqlite-query-execution

dcp
Have tried this and updated my post.
Derk
A: 

Use Begin Transaction and End Transaction anytime you write an sql query. This is explained here plus lots of other useful info:

SQLite Optimization FAQ

You can check the indexing details on where clause here:

SQLite Query Optimizer Overview

Aseem Gautam
If you don't use BEGIN/END TRANSACTION, each statement gets its own transaction. Using explicit transactions improves efficiency with multiple statements, but the OP is only executing one.
dan04
I am not writing.
Derk
+3  A: 

You probably should be using JOINs here and your use of correlated subqueries might well be slowing things down. Whilst it is true that often the optimizer can execute a cross join with a where clause as if it were a join, and can also execute correlated subqueries as joins, I wouldn't count on SQLite being able to do this in all cases. Rewriting your query with JOINs I think gives this:

SELECT DISTINCT prod2.value AS id
FROM product_to_value
JOIN features ON product_to_value.feature = features.int
JOIN product_to_value as prod2 ON product_to_value.product = prod2.product
JOIN featurevalues as featval3 ON prod2.value = featval3.id
JOIN featurevalues ON product_to_value.value = featurevalues.id
WHERE features.id = ?
AND featurevalues.id IN (?,?)
AND featval3.feature IN (?,?,?,?)

Try this and see it is faster (and still gives the correct result).

Mark Byers
SQLite translates JOINs to the WHERE clause. I am pretty sure the query does what has to do.
Derk
The problem I get with the query you posted is that it returns duplicate values. I could solve this with a DISTINCT or a temporary table, but this would be very slow, especially when it has to deal with a huge result set.I therefore figured I might go around that using a sub-query on every row of a smaller result set from featval3.IN(?,?,?,?).
Derk
@Derk: OK, I missed that. It was difficult to understand what your query was doing. I have added the DISTINCT. Correlated subqueries can be slow because you might need multiple scans over the tables/indexes instead of just one scan.
Mark Byers
Because the 4 joins create a huge result set. SQLite compares each row with the previous in case of a DISTINCT. I could speed it up with a temporary table, but that would still be slow :(
Derk
The sub-query takes about 0.3ms to perform. The sub-query should run about 150 times (amount of featval3 records left after IN (?,?,?,?)), so that would be 50ms for the query to run. That's why I don't get whats going on..
Derk
@Derk: About 0.3ms? For every single input? Are you sure that when you were timing it, that the results weren't already cached for the inputs you used?
Mark Byers
I timed it wrong, I see now. Yes, 0.3ms for the query to run, but also 1.2ms for fetching that single result. So 1.2 times 150 featval3 rows makes 180ms :P. I now see why it's so "slow". Is there a way of going around this so I don't have to read the value from the database. Instead use an index or something? And why is the fetching so slow?
Derk
@Derk: The optimizer should automatically select the appropriate index. So make sure that you have all the appropriate indexes (primary keys are also indexes) - post your table definitions if you are unsure. I am not familiar with the SQLite explain syntax unfortunately, so I can't see if you are missing an index from reading that output.
Mark Byers
What I just said isn't right either. Let me try again: To fetch one row from the sub-query takes less than 0.01ms, the query itself takes 0.3ms. Therefore 0.31 times 150 is <30ms. So it still doesn't make any sense to me. I figure it's still an index problem.
Derk
It looks like `prod2.value = featval3.id` doesn't seem to be making use of an index. When I recreate it in this way `prod2.value IN ('string')` it is extremely fast.
Derk
@Derk: Strange.... I can't explain why that works. But it's good you found a solution.
Mark Byers