views:

338

answers:

3

Hi all,

I’m looking to get the max discrepancy between two tables per day, per id. I have the following data in a mysql database

insert into test.foo values ('2010-01-10', 1, 10);
insert into test.foo values ('2010-01-10', 1, 5);
insert into test.foo values ('2010-01-10', 2, 10);
insert into test.foo values ('2010-01-10', 2, 10);
insert into test.foo values ('2010-01-10', 3, 15);
insert into test.foo values ('2010-01-10', 3, 15);
insert into test.foo values ('2010-01-11', 1, 5);
insert into test.foo values ('2010-01-11', 1, 5);
insert into test.foo values ('2010-01-11', 2, 5);
insert into test.foo values ('2010-01-11', 2, 5);
insert into test.foo values ('2010-01-11', 3, 5);
insert into test.foo values ('2010-01-11', 3, 5);

insert into test.bar values ('2010-01-10', 1, 5);
insert into test.bar values ('2010-01-10', 1, 5);
insert into test.bar values ('2010-01-10', 2, 5);
insert into test.bar values ('2010-01-10', 2, 5);
insert into test.bar values ('2010-01-10', 3, 5);
insert into test.bar values ('2010-01-10', 3, 5);
insert into test.bar values ('2010-01-11', 1, 10);
insert into test.bar values ('2010-01-11', 1, 10);
insert into test.bar values ('2010-01-11', 2, 5);
insert into test.bar values ('2010-01-11', 2, 5);
insert into test.bar values ('2010-01-11', 3, 5);
insert into test.bar values ('2010-01-11', 3, 5);

Here is my query:

SELECT t1.`date`, t1.id, t1.sums, t2.sums, max(t1.sums - t2.sums) FROM
  (select `date`, id, sum(val) sums
   from test.foo
   group by `date`, id) as t1,
  (select `date`, id, sum(val) sums
   from test.bar
   group by `date`, id) as t2
WHERE t1.`date` = t2.`date` AND t1.id = t2.id
group by t1.`date`

I’m getting this result:

+---------------------+----+------+------+------------------------+
| date                | id | sums | sums | max(t1.sums - t2.sums) |
+---------------------+----+------+------+------------------------+
| 2010-01-10 00:00:00 |  1 |   15 |   10 |                     20 |
| 2010-01-11 00:00:00 |  1 |   10 |   20 |                      0 |
+---------------------+----+------+------+------------------------+
2 rows in set (0.00 sec)

I’d like to be getting this result: I’m getting this result:

+---------------------+----+------+------+------------------------+
| date                | id | sums | sums | max(t1.sums - t2.sums) |
+---------------------+----+------+------+------------------------+
| 2010-01-10 00:00:00 |  1 |   15 |   10 |                     20 |
| 2010-01-11 00:00:00 |  2 |   10 |   10 |                      0 |  <----- 
+---------------------+----+------+------+------------------------+

Can anyone help me? I’d like to be getting the max difference, and then the line that went along with it. This query gives me the correct difference, but not the id and sums that go with it. A colleague suggested also grouping by id, but as I thought, that just flattened out the result and each id was listed instead of the one id for the day that had the max difference.

Thanks much in advance

+1  A: 
SELECT t1.`date`, t1.id, t1.sums, t2.sums, max(t1.sums - t2.sums) FROM
  (select `date`, id, sum(val) sums
   from test.foo
   group by `date`, id) as t1,
  (select `date`, id, sum(val) sums
   from test.bar
   group by `date`, id) as t2
WHERE t1.`date` = t2.`date` AND t1.id = t2.id
group by t1.`date`

In the outer query you are grouping by the date but not the id, thus you aren't getting the ID you were expecting. If you want to find ID associated with the the highest discrepancy then first you need to find the highest discrepency and then use another query to figure out what ID (or IDs) are associated with that. You'll have to decide what you want to do with duplicates. Something like this (untested)...

SELECT t3.`date`, t3.id, t3.diff
  (SELECT t1.`date`, t1.id, t1.sums, t2.sums, max(t1.sums - t2.sums) as diff FROM
    (select `date`, id, sum(val) sums
     from test.foo
     group by `date`, id) as t1,
    (select `date`, id, sum(val) sums
     from test.bar
     group by `date`, id) as t2
  WHERE t1.`date` = t2.`date` AND t1.id = t2.id) as t3
WHERE t3.diff = (correlated subquery to get maximum value of diff for each date)

Or use separate queries.

SorcyCat
This one returns id `1` for both rows.
Peter Lang
+1  A: 

This one should work for you.

It sorts the sums in descending order, assigning them a rank, and then only get those with rank=1.

SELECT id, `date`, sums FROM (
  SELECT id, `date`, sums,
  CASE
    WHEN @d != `date` THEN @rownum := 1 
    ELSE @rownum := @rownum + 1
  END AS rank,
  @d := `date`
FROM
(
  SELECT t1.`date`, t1.id, t1.sums t1_sums, t2.sums t2_sums, (t1.sums - t2.sums) sums
  FROM
    (select `date`, id, sum(val) sums
     from foo
     group by `date`, id) as t1,
    (select `date`, id, sum(val) sums
     from bar
     group by `date`, id) as t2,
     (SELECT @rownum := 0, @d := NULL) r
  WHERE t1.`date` = t2.`date` AND t1.id = t2.id
  GROUP BY t1.`date`, t1.id, t2.`date`, t2.id
  ORDER BY t1.`date`, (t1.sums - t2.sums) DESC, t1.id
  ) x
) y
WHERE rank = 1
Peter Lang
Thanks much for the help!
Tony Scavone
A: 

Output of the following query is:

+----------+- ---+------+------+-------+
|date      |id   |sumf  |sumb  |maxdiff|
+----------+- ---+------+------+-------+
|2010-01-10|    1|    30|    20|     10|
|2010-01-10|    2|    40|    20|     20|
|2010-01-10|    3|    60|    20|     40|
|2010-01-11|    1|    20|    40|     20|
|2010-01-11|    2|    20|    20|      0|
|2010-01-11|    3|    20|    20|      0|
+----------+- ---+------+------+-------+


select m.date, m.id, s.sumf, s.sumb, m.maxdiff
from (
    --subquery2: get the maximum different absolute sum between foo and bar for each date/id combination
    select s.date, s.id, max(abs(s.sumf - s.sumb)) as maxdiff
    from (
        --subquery1: get the sum of values for each date/id combination
        select f.date, f.id, sum(f.val) as sumf, sum(b.val) as sumb
        from foo f
        inner join bar b on f.date = b.date and f.id = b.id
        group by f.date, f.id
    ) s
    group by s.date, s.id
) m 
--join back against subquery1 to find out which sums gave us the max difference
inner join (
    select f.date, f.id, sum(f.val) as sumf, sum(b.val) as sumb
    from foo f
    inner join bar b on f.date = b.date and f.id = b.id
    group by f.date, f.id
) s on m.date = s.date and m.id = s.id and m.maxdiff = abs(s.sumf - s.sumb)

Note: this will return more than one row in the case of duplicate maxdiffs where the underlying sums being subtracted have different values. I believe this is the correct behavior if you must return sumf and sumb, otherwise you are not necessarily getting the values that created the maxdiff.

RedFilter
Did not try it, but I guess that would return more than one row per date if they share the same max-diff? As the behavior in that case is not defined, I'm curious if that is the intention of the OP :)
Peter Lang
I updated with the output from the query.
RedFilter
Thanks! This is obviously an example, and translating that to the 'real' problem might take some time. Only reason I didn't mark it as the answer. It is, after all, a straight query while the answer uses those dirty variables... :)
Tony Scavone