tags:

views:

35

answers:

1

Hi all,

I would like to group query results by consecutive appearances of a column values. Let's say I have a table which lists the winners of a competition for each year as follows:

year    team_name
2000    AAA
2001    CCC
2002    CCC
2003    BBB
2004    AAA
2005    AAA
2006    AAA

I would like a query which outputs:

start_end    total   team_name
2000         1       AAA
2001-2002    2       CCC
2003         1       BBB
2004-2006    3       AAA

I'm not too much worried about the format of "start_end" at long as I have the start and end or range (.e.g. one could use GROUP_CONCAT to produce 2004,2005,2006 instead of 2004-2006 and that would still be OK).

+1  A: 

Provided that your table looks like this :

"id";"year";"team"
"1";"2000";"AAA"
"2";"2001";"CCC"
"3";"2002";"CCC"
"4";"2003";"BBB"
"5";"2004";"AAA"
"6";"2005";"AAA"
"7";"2006";"AAA"

This query should do the trick :

SELECT a.year AS start
     , MIN(c.year) AS end
     , MIN(c.year)-a.year+1 AS total
     , CONCAT_WS('-', a.year, IF(a.year = min(c.year), NULL, min(c.year))) as start_end
     , a.team
  FROM 
     ( SELECT x.year, x.team, COUNT(*) id
         FROM results x
         JOIN results y
           ON y.year <= x.year
        GROUP BY x.id
     ) AS a
  LEFT JOIN 
     ( SELECT x.year, x.team, COUNT(*) id 
         FROM results x
         JOIN results y
           ON y.year <= x.year
        GROUP BY x.id
     ) AS b ON a.id = b.id + 1 AND b.team = a.team
  LEFT JOIN  
     ( SELECT x.year, x.team, COUNT(*) id 
         FROM results x
         JOIN results y
           ON y.year <= x.year
        GROUP BY x.id
     ) AS c ON a.id <= c.id AND c.team = a.team
  LEFT JOIN 
     ( SELECT x.year, x.team, COUNT(*) id 
         FROM results x
         JOIN results y
           ON y.year <= x.year
        GROUP BY x.id
     ) AS d ON c.id = d.id - 1 AND d.team = c.team
WHERE b.id IS NULL AND c.id IS NOT NULL AND d.id IS NULL
GROUP BY start;

BTW You might find the Common Queries Tree handy to solve these problems (check the answers for "Find previous and next values in a sequence") :p.

wimvds
You could try creating a temporary table with only the records of the winners, and use that in the query. That should speed things up I guess... ie. run `CREATE TEMPORARY TABLE results SELECT id, year, team FROM your_table WHERE ranking=1;` before running the query posted in my answer. btw If you still have a lot of winner records you can also add an index to that temporary table to speed things up even more.
wimvds
Creating the temporary table works fine (I added IF NOT EXISTS otherwise it complains when doing multiple tests). However your query now fails at the 1st sub-select (I renamed the aliases to pinpoint) with "Can't reopen table: 'x'"
Oops! Yeah, I forgot that, MySQL doesn't allow using temporary tables muliple times in the same query... You could probably solve this by creating another set of temporary tables (a second copy of the results and one temporary table per joined subselect) OR just create a regular table with the aggregated result, run the query and drop the table afterwards (this last option is not really a viable solution for a multiuser environment of course).
wimvds
Thanks for the support. I ended up putting the logic in PHP as I needed a solution in the meantime.