ansaurus

Question

Most efficient way of selecting the changes between timestamped snapshots

Answer 1

+1 A:

Update:

See this entry in my blog for performance details:

MySQL: difference between sets

SELECT  ts,
        (
        SELECT  GROUP_CONCAT(id)
        FROM    mytable mi
        WHERE   mi.ts =
                (
                SELECT  MAX(ts)
                FROM    mytable mp
                WHERE   mp.ts = mo.pts
                )
                AND NOT EXISTS
                (
                SELECT  NULL
                FROM    mytable mn
                WHERE   mn.ts = mo.ts
                        AND mn.id = mi.id
                )
        )
FROM    (
        SELECT  @r AS pts,
                @r := ts AS ts
        FROM    (
                SELECT  @r := NULL
                ) vars,
                (
                SELECT  DISTINCT ts
                FROM    mytable
                ) moo
        ) mo

To select only the last change:

SELECT  ts,
        (
        SELECT  GROUP_CONCAT(id)
        FROM    mytable mi
        WHERE   mi.ts =
                (
                SELECT  MAX(ts)
                FROM    mytable mp
                WHERE   mp.ts < mo.ts
                )
                AND NOT EXISTS
                (
                SELECT  NULL
                FROM    mytable mn
                WHERE   mn.ts = mo.ts
                        AND mn.id = mi.id
                )
        )
FROM    (
        SELECT  MAX(ts) AS ts
        FROM    mytable
        ) mo

For this to be efficient, you need to have a composite index on mytable (timestamp, id) (in this order).

Quassnoi 2009-08-19 13:15:31

Answer 2

+1 A:

Another way to view this is that you want to find records that exist in timestamp #1 that do not exist in timestamp #2. The easiest way?

SELECT Timestamp
FROM records AS t1
WHERE NOT EXISTS (SELECT 1 FROM records AS t2 WHERE t2.id = t1.id AND t2.Timestamp = t1.Timestamp + 1)

Of course, I'm exploiting here the fact that your example timestamps are integers, when in reality I imagine they are genuine timestamps. But it turns out the integers work so well for this particular purpose, they'd be really handy to have around. So, perhaps we should make a numbered list of all available timestamps. The easiest way to get that?

CREATE TEMPORARY TABLE timestamp_map AS (
    timestamp_id AS INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    timestamp_value AS DATETIME
);

INSERT INTO timestamp_map (timestamp_value) (SELECT DISTINCT timestamp FROM records ORDER BY timestamp);

(You could also maintain such a table permanently by use of triggers.)

It's a bit out there, but I've gotten similar techniques to work very efficiently in the past for data like what you describe, when lots of back-and-forth subqueries and NOT EXISTS proved too slow.

VoteyDisciple 2009-08-19 13:16:40

ansaurus

tags:

views:

answers:

Most efficient way of selecting the changes between timestamped snapshots

related questions