If I have a table with the hypothetical columns foo
and bar
. bar might have 50-60 distinct values in it. My goal here is to pick say, up to 5 rows for say 6 unique bars. So if the 6 unique bars that get selected out of the 50-60 each happen to has at least 5 rows of data, we'll have 30 rows in total.
views:
204answers:
5Is this getting called from some program?
If so perhaps you can just lookup the bars, and randomly send them into a select statement.
This way your select could simply be: select * from table where bar in (?,?), and you can move the randomness problem into code, which is frankly better at dealing with that.
I think the easiest way is to use a UNION.
SELECT * FROM table WHERE bar = 'a' LIMIT 5 UNION SELECT * FROM table WHERE bar='b' UNION SEL ....... you get the jist, i hope
EDIT: not sure if this is what you need - you don't say whether this query needs also to somehow determine the bars? or if they are passed in?
Its been a while since I've worked with MySQL (I've been working with MSSQL lately), but two things come to mind:
- Some sort of self join
- A Cursor
Self join might look something like
SELECT DISTINCT bar FROM table AS t1 LIMIT 5
JOIN table AS t2 ON t1.foo = t2.foo
Again, its been a while, so this might not be valid MySQL. Also, you'd get all the foo's back for the 5 bars, so you'd have to figure out how to trim that down.
What you'd really want to do is:
SELECT *
FROM `sometable`
WHERE `bar` IN (
SELECT DISTINCT `bar`
FROM `sometable`
ORDER BY RAND()
LIMIT 6
)
Unfortunately, you're likely to get this:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Possibly your version will be more cooperative. Otherwise, you'll probably need to do it as two queries.
A simple solution that takes 7 queries:
SELECT distinct bar FROM sometable ORDER BY rand() LIMIT 6
Then, for each of the 6 bar values above, do this, substituting {$bar} for the value, of course:
SELECT foo,bar FROM sometable WHERE bar='{$bar}' ORDER BY rand() LIMIT 5
Be careful about using "ORDER BY rand()" because it might cause MySQL to fetch a LOT of rows from your table, and compute the rand() function for all of them, and then sort them. This can take a long time if you have a big table.
If it does take a long time, then for the first query, you can remove the ORDER BY and the LIMIT clauses, and select 6 random values in your program code after the query is done.
For the second query, you can split it in to two steps:
SELECT count(*) FROM sometable WHERE bar='{$bar}'
Then, in your program code, you know how many items there are so you can randomly choose which of them to look at, and use OFFSET and LIMIT:
SELECT foo,bar FROM sometable WHERE bar='{$bar}' LIMIT 1 OFFSET {$offset}