tags:

views:

407

answers:

2

How can I limit a result set to n distinct values of a given column(s), where the actual number of rows may be higher?

Input table:

client_id, employer_id, other_value
1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj
6, 1, 34kjf
7, 7, 34kjf
8, 6, lkjkj
8, 7, 23kj

desired output, where limit distinct=5 distinct values of client_id:

1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj

Platform this is intended for is MySQL.

+4  A: 

You can use a subselect

select * from table where client_id in 
(select distinct client_id from table order by client_id limit 5)
Vinko Vrsalovic
I think this is right but I'll leave the question open for a little while to see if anyone else has a better answer
ʞɔıu
A: 

This is for SQL Server. I can't remember, MySQL may use a LIMIT keyword instead of TOP. That may make the query more efficient if you can get rid of the inner most subquery by using the LIMIT and DISTINCT in the same subquery. (It looks like Vinko used this method and that LIMIT is correct. I'll leave this here for the second possible answer though.)

SELECT
     client_id,
     employer_id,
     other_value
FROM
     MyTable
WHERE
     client_id IN
     (
          SELECT TOP 5
               client_id
          FROM
          (
               SELECT DISTINCT
                    client_id
               FROM
                    MyTable
          ) SQ
          ORDER BY
               client_id
     )

Of course, add in your own WHERE clause and ORDER BY clause in the subquery.

Another possibility (compare performance and see which works out better) is:

SELECT
     client_id,
     employer_id,
     other_value
FROM
     MyTable T1
WHERE
     T1.code IN
     (
          SELECT
               T2.code
          FROM
               MyTable T2
          WHERE
               (SELECT COUNT(*) FROM MyTable T3 WHERE T3,code < T2.code) < 5
     )
Tom H.