tags:

views:

660

answers:

2

Hi All,

I've been using SQL for years now but rarely anything more that simple inserts and selects etc... so I'm no SQL expert. I'm wondering if I could get some help in optimizing a more complex SQL statement that I'm executing on SQLite, from PHP through PDO.

The statement seems to work correctly, just seems to take longer that I would have expected (or perhaps I'm just expecting too much).

This is the SQL:

INSERT OR IGNORE INTO MailQueue(SubscriberID, TemplateID)
    SELECT Subscribers.ID, '1' AS TemplateID
    FROM Subscribers 
    INNER JOIN SubscriberGroups ON Subscribers.ID=SubscriberGroups.SubscriberID
    WHERE SubscriberGroups.GroupID IN ('1', '2', '3')
    AND Subscribers.ID NOT IN 
     ( 
     SELECT Subscribers.ID FROM Subscribers 
     INNER JOIN SubscriberGroups ON Subscribers.ID=SubscriberGroups.SubscriberID
     WHERE SubscriberGroups.GroupID IN ('4', '5', '6')
     );

What I've got is a list of subscribers, in one or more groups. I want to add subscribers to a mail queue, selecting those that belong to one or more groups (1,2,3) but exclude those that are also in another set of groups (4,5,6).

Firstly, is the above SQL typical of how to do this?

Secondly, what indicies should I have to make this work as efficiently as possible?

Currently it takes about 30 seconds to get through about 5000 subscriber records (and a handful of groups) on a avg spec LAMP.

At the end of the day performance isn't that critical, but I'd like to understand this stuff better so any insight greatly appreciated.

Brad

+5  A: 

Chances are the extra joins are killing you. What if you do:

SELECT Subscribers.ID, '1' AS TemplateID
FROM Subscribers 
WHERE EXISTS( SELECT *
                FROM SubscriberGroups
               WHERE Subscribers.ID=SubscriberGroups.SubscriberID
            AND SubscriberGroups.GroupID IN ('1', '2', '3') )

  AND NOT EXISTS( SELECT *
                    FROM SubscriberGroups
                   WHERE Subscribers.ID=SubscriberGroups.SubscriberID 
                     AND SubscriberGroups.GroupID IN ('4', '5', '6')
    );

You'd also want to make sure that you have an index on SubscriberGroups( SubscriberID, GroupID )

My guess is that Subscribers already has an index on ID, right?

EDIT: Another option, which may or may not be faster. Look at the query plans of each to see...

This one may be a single index scan which could be faster than two index seeks, but it depends on SQLite's optimizer...

SELECT Subscribers.ID, '1' AS TemplateID
FROM Subscribers 
INNER JOIN( SELECT SUM( CASE WHEN GroupID IN('1', '2', '3') THEN 1 ELSE 0 END ) AS inGroup,
                   SUM( CASE WHEN GroupID IN('4', '5', '6') THEN 1 ELSE 0 END ) AS outGroup,
                   SubscriberID
          FROM SubscriberGroups
          WHERE SubscriberGroups.GroupID IN ('1', '2', '3', '4', '5', '6' )
          ) SubscriberGroups
       ON Subscribers.ID=SubscriberGroups.SubscriberID
      AND inGroup  > 0
      AND outGroup = 0
Matt Rogish
Thanks Matt, that's great. Your first solution got it from 30 seconds down to about 5 or 6 which is good enough. I didn't try the second option cause I don't really understand it, but I'll keep it in mind if it becomes a problem. Thanks Again
cantabilesoftware
+2  A: 

Another way to write the SQL which might be faster (I don't have SQLite on which to test):

SELECT
     S.ID,
     '1' AS TemplateID     -- Is this really a string? Does it need to be?
FROM
     Subscribers S
LEFT OUTER JOIN SubscriberGroups SG ON
     SG.SubscriberID = S.ID
WHERE
     SG.SubscriberID IS NULL AND
     EXISTS
     (
          SELECT
               *
          FROM
               SubscriberGroups SG2
          WHERE
               SG2.SubscriberID = S.ID AND
               SG2.GroupID IN ('1', '2', '3')  -- Again, really strings?
     )

Matt's method should also work well. It all just depends on how SQLite decides to create the query plans.

Also, please note my comments. If those are really defined as INT data types in your database, there will be some extra processing to convert between the two differing data types. If they are strings in the database, is there a reason for that? Do you have non-numeric values in those columns?

Tom H.
Thanks Tom, you're right on the quoted IDs... not sure why I had them in there. I didn't try your suggestion because matt's seems to work well and your's seemed to miss the exclude groups part (4,5,6). Thanks anyway!
cantabilesoftware
Actually looking at this more closely I realized one of the ID columns wasn't declared as INTEGER which is why I needed the quotes. SQLite is not real fussy about types which is why I missed it. Changing to integer and removing quotes it now runs in about 1/2 second. Thanks!
cantabilesoftware
Mine should take care of the excludes. Did you test it? That's what the whole LEFT JOIN is about. Check a NOT NULL column from the joined table in the WHERE clause and if it's NULL you know that no match exists.
Tom H.