tags:

views:

363

answers:

2

I've got a sqlite table actions that looks something like this:

uuid varchar (36)
actiondate int
username varchar (16)
mood int
bonus int
status varchar (80)
... bunch of other similar fields (all short varchar or int fields)

This design seems to be sufficiently performant for most types of queries, but struggles a bit with a particular scenario, where I need to get some data about the latest action performed by each user as of a given date.

I was hoping to be able to do something like this:

SELECT status, actiondate
FROM actions WHERE actiondate < 20061231
GROUP BY username
ORDER BY actiondate DESC

However, the aggregation is not done with respect to the order clause, the order clause just determines what order the results are returned in, which makes sense.

So, I have this:

SELECT actiondate, status FROM actions
WHERE actiondate < 20061231 and
uuid = (SELECT uuid from actions as alt
        WHERE alt.username = actions.username
        ORDER BY actiondate DESC LIMIT 1)

Is there a better way of doing this sort of query? A better table layout? Currently this sort of query is taking ~400ms on my development box, and it'd be nice if I could shave 100ms or so off it (my target time is actually 100ms, but I'm skeptical as to whether that's manageable).

I've obviously got indexes on username and date (I've actually got several: one which is which seems to fit the slow query quite well; one on username; one on date ASC; one on date DESC and one on uuid).

FWIW, the action table's likely to have somewhere between 100 and 30,000 rows in it.

+1  A: 

Your index should cover all the columns used in the query for maximum performance.

I'm not sure about the performance of nested query in this case. I'd prefer to join to a subquery if the execution plan doesn't show that it's converting it to a good nested join.

For something like this, I might avoid the UUID if possible, and if not, I would ensure that it's increasing, so you could write:

SELECT actiondate
    ,status
FROM actions
INNER JOIN (
    SELECT username
        ,MAX(uuid) as last_uuid from actions
    WHERE actiondate < 20061231
    GROUP BY username
) AS last_occur
    ON last_occur.username = actions.username
    AND last_occur.last_uuid = actions.uuid
WHERE actiondate < 20061231

I would think this should perform well with an index on username ASC, uuid DESC, INCLUDE (actiondate) and and index on actiondate DESC, username ASC, INCLUDE (status), but obviously look at the query plan.

Without the increasing uuids, you will need some kind of rule to ensure you are selecting the latest action for a person, since unless username, actiondate is unique, there is nothing in your original ORDER BY actiondate DESC limit 1 to ensure you are picking the correct row each time. If username, actiondate is unique, then you can use the following:

SELECT actiondate
    ,status
FROM actions
INNER JOIN (
    SELECT username
        ,MAX(actiondate) as last_actiondate from actions
    WHERE actiondate < 20061231
    GROUP BY username
) AS last_occur
    ON last_occur.username = actions.username
    AND last_occur.last_actiondate = actions.actiondate
WHERE actiondate < 20061231

If it is not unique, it will still work, but you will get multiple actions for a person on their last actiondate. The recommended indexes would also be different in this case (and better), because the large uuid is not necessary.

Cade Roux
With those indexes the query time is down to about 273ms, but the constraint on increasing uuids is not straightforward. Will think about this solution - thanks!
Dominic Rodger
Update answer to include discussion about discriminating what the "last record" really is for a user.
Cade Roux
Thanks - query time down to about 90ms with your edits! I might have make some changes that might slow it down a bit to remove the "multiple actions for a person on their last actiondate", but this looks really promising.
Dominic Rodger
ps. I'll mark this as accepted once I've done a bit more testing. Thanks again for your help!
Dominic Rodger
+2  A: 

Correctness before speed -- your query:

SELECT actiondate, status FROM actions
WHERE actiondate < 20061231 and
uuid = (SELECT uuid from actions as alt
        WHERE alt.username = actions.username
        ORDER BY actiondate DESC LIMIT 1)

doesn't perform the task you describe -- the inner select may return a uuid for an action that's later than 2061231, then the outer select will give no result for that username. I think you can fix it my moving the WHERE check on actiondate as an AND in the nested select. (I doubt this will speed things up, but at least it should make the behavior correct -- let us know about how, if at all, it affects the speed!).

Alex Martelli
Yeah, I noticed that and fixed it in my suggestion.
Cade Roux
Thanks, that fixes the results and makes it a fraction quicker (average time down to ~325ms)
Dominic Rodger
Hmmm... Actually, on further investigation, that has changed the speed average time taken over 10 queries down from ~390ms to ~387ms. Thanks for pointing out the mistake though.
Dominic Rodger