tags:

views:

39

answers:

2

I've been using sqlite3 in a python application, and when testing it, my queries sometimes cause the program to freeze up. A rough (from memory) example:

SELECT id,name FROM main_table WHERE name IN 
(SELECT name FROM another_table WHERE another_table.attribute IN 
('foo', 'bar', 'baz'))

Often, the first time I attempt something like this, the program simply freezes. Now, if I try the subquery first and then the whole nested mess, it works almost instantly.

I'm guessing that it's caching the results of the first, simpler query for later which makes things faster the next time around, but even so I'd like to know how to avoid this stalling in the first place.

+1  A: 

You didn't mention anything about indexes... name in both tables should be indexed at a minimum.

Here's an equivalent using JOINs:

SELECT DISTINCT
       x.id,
       x.name 
  FROM main_table x
  JOIN ANOTHER_TABLE y ON y.name = x.name
                      AND y.attribute IN ('foo', 'bar', 'baz')

But mind that if there are more than one record in ANOTHER_TABLE that associates with a MAIN_TABLE record - the JOIN will produce duplicates. Hence the need for the DISTINCT (or GROUP BY if that's your preference).

The EXISTS is likely to be a better choice than IN:

SELECT x.id,
       x.name 
  FROM main_table x
 WHERE EXISTS(SELECT NULL
                FROM ANOTHER_TABLE y 
               WHERE y.name = x.name
                 AND y.attribute IN ('foo', 'bar', 'baz'))
OMG Ponies
I'm very new to SQL and have never done a JOIN, so it'll be back to the books for a bit on this. Also, it seems that a lot of SQL performance questions are answered with "create an index." While I can see the syntax to do so, I don't exactly understand what an index does. Can you reccomend an explanation for me? I don't want to cargo-cult this and use something without understanding it.
spook
@spook: [The answers here serve as a good primer on indexes](http://stackoverflow.com/questions/105400/what-are-indexes-and-how-can-i-use-them-to-optimize-queries-in-my-database)
OMG Ponies
A: 

select ... where ... in ... queries often perform poorly. They are usually treated by optimizers as a potentially very large series of (name=val1 or name=val2 or ... or name=valn)

Try using inner joins for the subquery:

SELECT 
    id
    ,name 
FROM 
    main_table 
    INNER JOIN another_table on 
        main_table.name=another_table.name
        and another_table.attribute in (
            'foo','bar','baz'
        )

Michael MacDonald