tags:

views:

68

answers:

3

Here's my current query:

SELECT questions.question, questions_headings.title FROM questions JOIN questions_headings ON questions.heading=questions_headings.id WHERE questions.heading IN (1,2,3) ORDER BY RANDOM() LIMIT 10

Basically, the database contains Questions for various Headings. For example:

questions_headings:

+----+-------+
| id | title |
+----+-------+
| 0  | blah1 |
+----+-------+
| 1  | lol1  |
+----+-------+
| 2  | etc1  |
+----+-------+

questions:

+----+---------+----------+
| id | heading | question |
+----+---------+----------+
| 0  |   1     | howdoi   |
+----+---------+----------+
| 1  |   0     | blahques |
+----+---------+----------+
| 2  |   1     | herro    |
+----+---------+----------+

What my query does is randomly selects X amount of questions from the given headings and shows them to the user.

Currently, if you want 10 random questions (LIMIT 10), it gives you 10 random questions across all IDs. Normal, right? But I don't want this.

What I need the query to do is, pull out 10 random questions distributed across the given IDs. That way, I won't end up with 9 questions from one heading and 1 question from the other.

Hope that made sense...

Is it possible to do this with just SQL?

A: 

If you know the headings when creating the query, you can do something like the following:

SELECT * FROM (
  SELECT questions.question, questions_headings.title FROM questions JOIN questions_headings ON questions.heading=questions_headings.id WHERE questions.heading = 1 ORDER BY RANDOM() LIMIT 3
  UNION
  SELECT questions.question, questions_headings.title FROM questions JOIN questions_headings ON questions.heading=questions_headings.id WHERE questions.heading = 2 ORDER BY RANDOM() LIMIT 3
  UNION
  SELECT questions.question, questions_headings.title FROM questions JOIN questions_headings ON questions.heading=questions_headings.id WHERE questions.heading = 3 ORDER BY RANDOM() LIMIT 3
  UNION
  SELECT questions.question, questions_headings.title FROM questions JOIN questions_headings ON questions.heading=questions_headings.id WHERE questions.heading IN (1,2,3) ORDER BY RANDOM() LIMIT 10
) LIMIT 10

The idea is, get 10/3 = 3 from each heading and then get the remainings (because of rounding errors) from all of them. Because the last query could result in data that has already been fetched, we limit by 10, so we definitly get the additional 1 we need and then LIMIT the whole thing again. This should result in evenly distributed values.

inflagranti
The problem is, the application will be receiving an unknown amount of `questions.heading` ids. Is there no way to do it with 'pure SQL'?
Martin
In db2 I could write you a query using row_number() that would work, but I cannot think of a way in MySQL.
inflagranti
Out of curiosity, could you provide the query?
Martin
A: 

The following will work in DB2 (and could be easily ported to other databases supporting row_number):

SELECT * FROM (
  SELECT question, title 
  FROM (
    SELECT questions.question, questions_headings.title,
      row_number() over(PARTITION BY questions.headings ORDER BY rand()) rn
    FROM questions JOIN questions_headings ON questions.heading=questions_headings.id
    WHERE questions.heading IN (1,2,3)
  )
  WHERE rn <= 3
  UNION
  SELECT questions.question, questions_headings.title
  FROM questions JOIN questions_headings ON questions.heading=questions_headings.id
  WHERE questions.heading = 1
  ORDER BY RAND() 
  FETCH FIRST 10 ROWS ONLY
)
FETCH FIRST 10 ROWS ONLY
inflagranti
A: 

This works for PostgreSQL:

SELECT q2.row - (SELECT count(*) 
                 FROM questions q3 
                 WHERE q3.heading IN (1, 2, 3)
                 AND q3.heading < q2.heading) AS ord,
       q2.heading, q2.question 
FROM (
  SELECT row_number() OVER (ORDER BY heading, random()) as row,
         q1.id, q1.heading, q1.question 
  FROM questions q1 
  WHERE heading IN (1,2,3) 
  ORDER BY heading, row) AS q2 
ORDER BY ord 
LIMIT 10;

I can't test the MySQL version, but this should work maybe:

SELECT q2.row - (SELECT count(*) 
                 FROM questions q3 
                 WHERE q3.heading IN (1, 2, 3) 
                 AND q3.heading < q2.heading) AS ord,
       q2.heading, q2.question 
FROM (
  SELECT @rownum:=@rownum+1 AS row, 
         q1.id, q1.heading, q1.question 
  FROM questions q1, (SELECT @rownum:=0) r
  WHERE heading IN (1,2,3) 
  ORDER BY heading, rand()) AS q2
ORDER BY ord 
LIMIT 10;

The point is to order the records so that there is exactly one question from every selected heading before a second question from any selected heading and so on. If there are not sufficient questions for some headings it will still be equally distributed over the other headings.

rudi-moore