views:

27

answers:

1

... where "missing records" are identical to the last recorded value, hence no record.

This may be subjective, but I'm hoping there's a standardised way of doing this.

So, let's say I have a bunch of analytics in a MySQL table. There is some missing information, but as mentioned above, that's because their previous value is the same as the current value.

table "table":

id    value      datetime
1     5          1285891200    // Today
1     4          1285804800    // Yesterday
2     18         1285804800    // Yesterday
2     16         1285771094    // The day before yesterday

As you can see, I don't have a value for today for id 2.

If I wanted to pull the "most recent value" from this table (that is, 1's "today", and 2's "yesterday", how do I do that? I've achieved it by running the following query:

SELECT id, value FROM (SELECT * FROM table ORDER BY datetime DESC) as bleh GROUP BY id

Which utilizes a subquery to order the data first, and then I rely on "GROUP BY" to pick the first value (which, since it is ordered, is the most recent) from each id. However, I don't know if shoving a subquery in there is the best way to get the most recent value.

How would you do it?

The desired table:

id    value      datetime
1     5          1285891200    // Today
2     18         1285804800    // Yesterday

Thanks...

+1  A: 

Gotta love MySQL for allowing an order by in a subquery. That's not allowed by the SQL standard :)

You could rewrite the query in a standards complaint way like:

select  *
from    YourTable a
where   not exists
        (
        select  *
        from    YourTable b
        where   a.id = b.id
        and     a.datetime < b.datetime
        )

In case there are duplicates that you can't split apart in the subquery, you can group by and then pick an arbitrary value:

select  a.id
,       max(a.value)
,       max(a.datetime)
from    YourTable a
where   not exists
        (
        select  *
        from    YourTable b
        where   a.id = b.id
        and     a.datetime < b.datetime
        )
group by
        a.id

This chooses the maximum a.value sharing the latest datetime. Now datetime is the same for all duplicate rows, but standard SQL doesn't know that, so you have to specify a way to pick from the equal days. Here, I'm using max, but min or even avg would work just as well.

Andomar
Nice! Okay - but we're assuming that there is more than one value per id, so we can compare and get the "latest" - but this doesn't account for the corner case where there is only one value, because it won't satisfy `a.datetime < b.datetime`. ;) I worked around that with a separate query to make sure there are always 2, but just food for thought. :)
Julian H. Lam
@Julian H. Lam: The condition is behind `not exists`, so if it's not satisfied, it will be included in the result set. I'd expect the query to work if there's only one value.
Andomar