tags:

views:

102

answers:

1

I have a table like this:

someid    somestring
1         Hello
1         World
1         Blah
2         World
2         TestA
2         TestB
...

Currently I'm grouping by the id and concatenating the strings, so I end up with this:

1         Hello,World,Blah
2         World,TestA,TestB
...

Is it possible to do a second grouping so that if there are multiple entries that end up with the same string, I can group those too?

+4  A: 

Yes, just put your current query in an inner select and apply a new GROUP BY to the outer select. Note that you will probably want to use ORDER BY of GROUP_CONCAT to ensure that the strings are always concatenated in the same order.

SELECT somelist, COUNT(*) FROM
(
    SELECT
        someid,
        GROUP_CONCAT(somestring ORDER BY somestring) AS somelist
    FROM table1
    GROUP BY someid
) AS T1
GROUP BY somelist

Result:

'Blah,Hello,World', 1
'TestA,TestB,World', 2

Here's the test data I used:

CREATE TABLE table1 (someid INT NOT NULL, somestring NVARCHAR(100) NOT NULL);
INSERT INTO table1 (someid, somestring) VALUES
(1, 'Hello'),
(1, 'World'),
(1, 'Blah'),
(2, 'World'),
(2, 'TestA'),
(2, 'TestB'),
(3, 'World'),
(3, 'TestB'),
(3, 'TestA');
Mark Byers
How efficient are sub-queries in general? There are always questions on here about "how can I make this one query/un-nest a query" etc, which implies that nested queries are bad in some way.
DisgruntledGoat
Correlated subqueries (where your inner query references the outer query) can be very slow if the optimizer doesn't figure out a way to execute it as a JOIN and instead executes the inner query once per row of the outer query. But this isn't a correlated subquery so there is no problem.
Mark Byers
Just remember that GROUP_CONCAT() is length limited, generally to 1,024 bytes by default. This answer can break if the inner query has a "large" result set. You can get the current limit with `show variables like '%group_concat%'`
Marc B