views:

94

answers:

2

I need help with select statement in SQL Server / T-SQL

My table looks like this:

Id (int)  
QuestionId (int)  
GenreId (int)  
QuestionScore (int)

I want to select random N rows from this table so that maximum number of same GenreId in the result set is less than X for all GenreId-s except one. For that one GenreId, I need row count with that GenreId to be equal to Y.

UPDATE
I made up this query from suggestions below, it works exactly as i wanted (except for one genre, but thats no problem, let it be this way, ill have 2 queries)

select top @N * from
(select Id,GenreId,Rank() over (Partition BY GenreId order by newId()) as Rank,QuestionScore from Questions) t
where t.Rank <= @X
order by newId()

now i need to select rows so that average QuestionScore is between 1.7 AND 2.3
how can i do that? i need all columns returned in result set.
thanks in advance :)

+1  A: 

For SQL Server 2005+, use:

SELECT TOP (@n) c.*
  FROM (
SELECT a.id,
       a.questionid,
       a.genreid
  FROM (SELECT t.*,
               ROW_NUMBER() OVER (PARTITION BY t.genreid) AS rank
          FROM TABLE t
         WHERE t.genreid NOT IN (SELECT TOP 1 --ensure only one genre, see order by
                                        t.genreid
                                   FROM TABLE t
                               GROUP BY t.genreid
                                 HAVING COUNT(*) = @y 
                               ORDER BY t.genreid) 
      ) a
 WHERE a.rank < @x
UNION ALL
SELECT b.id,
       b.questionid,
       b.genreid
  FROM TABLE b
 WHERE b.genreid IN (SELECT TOP 1 --ensure only one genre, see order by
                            t.genreid
                       FROM TABLE t
                   GROUP BY t.genreid
                     HAVING COUNT(*) = @y
                   ORDER BY t.genreid ) ) c
OMG Ponies
I think @y is a different limit for a specific genere, it's not supposed to be used to find a genre. If it were, it could find more then one, contradicting the question
Andomar
@Andomar: The OP states that the genreid that is excluded from the set has to have a row count of *y*. It's very likely there could be more than one genreid with *y* number of rows - TOP would handle that once we knew more about which of the duplicates would be selected.
OMG Ponies
Thank you a lot :)
gio_333m
A: 

In SQL Server, you can do that with nested subqueries and top clauses:

select  top (@n) * 
from    (
        -- Retrieve @y rows from the special genre
        -- The prio field is used to ensure all these rows make it inside @n
        select  top (@y) 1 as prio, genreid, questionid
        from    @t
        where   genreid = @the_one

        -- And up to @x rows per non-special genre
        union all
        select  2 as prio, genreid, questionid
        from    (
                select  *
                ,       row_number() over (partition by genreid 
                                           order by newid()) as rownr
                from    @t
                where   genreid <> @the_one
                ) sub
        where rownr < @x
        ) sub2
order by
        prio, newid()

Sample data:

declare @t table (id int identity, QuestionId int, GenreId int)

insert @t (GenreId, QuestionId) values 
    (1,1),
    (2,1),(2,1),
    (3,1),(3,1),(3,1),
    (4,1),(4,1),(4,1),(4,1),
    (5,1),(5,1),(5,1),(5,1),(5,1)

declare @n int
declare @x int
declare @y int
declare @the_one int

set @n = 7 -- Total rows
set @x = 3 -- With less then 3 per genre
set @y = 3 -- Except three rows from genre @the_one
set @the_one = 3

Results in (one example, output differs on each run:

prio  genreid  questionid
1     3        1
1     3        3
1     3        2
2     4        1
2     1        1
2     5        1
2     5        4
Andomar
Thank you a lot :)
gio_333m