views:

75

answers:

1

I'm trying to get this query to work.

The query was written by "OMG Ponies" as an answer to: http://stackoverflow.com/questions/3796228/fix-mysql-query-to-return-random-row-within-subgroup

The query below calculates correctly the difference in dates, but then fails to select the ROW (within ID1-ID2 pairs) with the minimum value of that difference.

  DROP TABLE IF EXISTS temp4;
    CREATE TABLE temp4 AS
    SELECT x.id1,
           x.id2,
           x.YEAR,
           x.MMDD,
           x.id3,
           x.id3_YEAR,
           x.id3_MMDD
     FROM (SELECT t.*,
                   ABS(DATEDIFF(CONCAT(CAST(t.id3_YEAR AS CHAR(4)),'-', LEFT(t.id3_MMDD,2),'-',RIGHT(t.id3_MMDD,2)),
                            CONCAT(CAST(t.YEAR AS CHAR(4)),'-', LEFT(t.MMDD,2),'-',RIGHT(t.MMDD,2))))  AS diff,
                   CASE 
                     WHEN @id1 = t.id1 AND @id2 = t.id2 THEN @rownum := @rownum + 1
                     ELSE @rownum := 1
                   END AS rk,
                   @id1 := t.id1,
                   @id2 := t.id2
              FROM temp3 t
              JOIN (SELECT @rownum := 0, @id1  := 0, @id2 := 0) r
          ORDER BY t.id1, t.id2, diff, RAND()) x
     WHERE x.rk = 1;

I'm using the query to randomly draw one row within each group defined by a ID1-ID2 pair. I want the ID3 with minimum difference in dates to YEAR-MMDD (i.e. the absolute difference between YEAR-MMDD and YEAR_ID3-MMDD_ID3 should be minimized). If there is more than one with the exact same date, the query should select one at random.

If this were the table...

ID1 ID2 YEAR  MMDD  ID3 YEAR_ID3  MMDD_ID3
---------------------------------------
1   2   1991  0821  55  1991      0822    
1   2   1991  0821  57  1991      0822    
1   2   1991  0821  88  1992      0101
1   3   1990  0131  89  2000      0202    
1   3   1990  0131  89  2001      0102

Then the query should return

1,2,1991,0821,55 (OR 1,2,1991,0821,57 - ACCORDING TO THE RANDOM DRAW)
1,3,1990,0131,89

I'm pasting here a SQL DUMP of a TEST TABLE...

DROP TABLE IF EXISTS `temp3`;
CREATE TABLE IF NOT EXISTS `temp3` (
  `id1` char(7) NOT NULL,
  `id2` char(7) NOT NULL,
  `YEAR` year(4) NOT NULL,
  `MMDD` char(4) NOT NULL,
  `id3` char(7) NOT NULL,
  `id3_YEAR` year(4) NOT NULL,
  `id3_MMDD` char(4) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;


INSERT INTO `temp3` VALUES('1', '2', 1992, '0107', '55', 1991, '0528');
INSERT INTO `temp3` VALUES('1', '2', 1992, '0107', '57', 1991, '0701');
INSERT INTO `temp3` VALUES('1', '3', 1992, '0107', '88', 2000, '0101');
INSERT INTO `temp3` VALUES('1', '3', 1992, '0107', '44', 2000, '0101');
+1  A: 

This is a working solution. Thanks @OMG Ponies for your help.

SELECT
    x.id1,
    x.id2,
    x.YEAR,
    x.MMDD,
    x.id3,
    x.id3_YEAR,
    x.id3_MMDD
FROM
(   SELECT
        t.*,
        @rownum := CASE 
            WHEN @id1 = t.id1 AND @id2 = t.id2 THEN @rownum + 1
            ELSE 1
            END AS rk,
        @id1 := t.id1,
        @id2 := t.id2
    FROM
    (   SELECT
            t.*,
             ABS(DATEDIFF(CONCAT(CAST(t.id3_YEAR AS CHAR(4)),'-', LEFT(t.id3_MMDD,2),'-',RIGHT(t.id3_MMDD,2)),
             CONCAT(CAST(t.YEAR AS CHAR(4)),'-', LEFT(t.MMDD,2),'-',RIGHT(t.MMDD,2)))) AS diff
        FROM temp3 t
        ORDER BY t.id1, t.id2, diff, RAND()
    ) t,
    (   SELECT @rownum := 0, @id1 := null, @id2 := null ) r
) x
WHERE x.rk = 1;
Cat
+1: Well done, I was burnt out between this and the other question.
OMG Ponies