tags:

views:

76

answers:

4

I'm using Sql-Server 2005

I have Users table with userID and gender. I want to select top 1000 males(0) and top 1000 females(1) order by userID desc.

If i create union only one result set is ordered by userID desc. What other way to do that?

SELECT top 1000 *
FROM Users
where gender=0
union
SELECT top 1000 *
FROM Users
where gender=1
order by userID desc
A: 

You need to ensure that you create a sub-select for the union, then do the ordering outside over the combined results.

Something like this should work:

SELECT u.*
  FROM (SELECT u1a.* FROM (SELECT TOP 1000 u1.*
                             FROM USERS u1
                            WHERE u1.gender = 0
                            ORDER BY u1.userid DESC) u1a
        UNION ALL
        SELECT u2a.* FROM (SELECT TOP 1000 u2.*
                             FROM USERS u2
                            WHERE u2.gender = 1
                            ORDER BY u2.userid DESC) u2a
       ) u
ORDER BY u.userid DESC

Also, using a UNION ALL will give better performance as the db won't bother checking for duplicates (which there won't be in this query) in the results.

Tom
Maybe i'm wrong but it will select top males and females based on what factor, i guess no factor?
eugeneK
Close but that won't work it will select any 1000 users who are male and any 1000 that are female, he wants the latest 1000
Chris Diver
Thanks, an error through quick typing -- now fixed.
Tom
Don't think you can have an `order by` in both parts of a union, only one at the end is allowed
Andomar
@Tom, your new query doesn't work at all.
eugeneK
@Andomar, strangely enough it is working just does not give proper results. I was thinking the same thus executed it to correct Tom.
eugeneK
What exactly is this query not doing correctly now?
Tom
@Tom, my highest userID is 70000 and your query's highest userID is 54000
eugeneK
+1 The last two versions of this answer worked on my machine, even though they were posted later than Chris'
Andomar
@eugineK really? I can't see how, but I believe you. Is that with the query as it stands now, or an earlier version?
Tom
Didn't work for me either `Incorrect syntax near the keyword 'UNION'`, need to give the sub queries a table name. `ORDER BY u1.UserID DESC) tablename UNION ALL`
Chris Diver
+1  A: 

Done some testing, and the results are pretty strange. If you specify an order by in both parts of a union, SQL Server gives a syntax error:

select top 2 * from @users where gender = 0 order by id
union all
select top 2 * from @users where gender = 1 order by id 

That makes sense, because the order by should only be at the end of the union. But if you use the same construct in a subquery, it compiles! And works as expected:

select * from (
    select top 2 * from @users where gender = 0 order by id
    union all
    select top 2 * from @users where gender = 1 order by id
) sub

The strangest thing happens when you specify only one order by for the subquery union:

select * from (
    select top 2 * from @users where gender = 0
    union all
    select top 2 * from @users where gender = 1 order by id
) sub

Now it orders the first half of the union at random, but the second half by id. That's pretty unexpected. The same thing happens with the order by in the first half:

select * from (
    select top 2 * from @users where gender = 0 order by id desc
    union all
    select top 2 * from @users where gender = 1
) sub

I'd expect this to give a syntax error, but instead it orders the first half of the union. So it looks like union interacts with order by in a different way when the union is part of a subquery.

Like Chris Diver originally posted, a good way to get out of the confusion is not to rely on the order by in a union, and specify everything explicitly:

select  *
from    (
        select  *
        from    (
                select  top 2 *
                from    @users
                where   gender = 0
                order by 
                        id desc
                ) males
        union all
        select  *
        from    (
                select  top 2 *
                from    @users
                where   gender = 1
                order by 
                        id desc
                ) females
        ) males_and_females
order by 
        id

Example data:

declare @users table (id int identity, name varchar(50), gender bit)

insert into @users (name, gender)
          select 'Joe', 0
union all select 'Alex', 0
union all select 'Fred', 0
union all select 'Catherine', 1
union all select 'Diana', 1
union all select 'Esther', 1
Andomar
Same mistake as Tom
Chris Diver
Why do you need a double sub-query, does my query not work?
Chris Diver
@Chris Diver: Your query does work. I'm writing out even more explicitly what the final `order by` applies to
Andomar
@Andomar Okay, thanks.
Chris Diver
+3  A: 

Martin Smith's solution is better than the following.

SELECT UserID, Gender
FROM 
  (SELECT TOP 1000 UserId, Gender 
   FROM Users 
   WHERE gender = 0
   ORDER BY UserId DESC) m
UNION ALL
SELECT UserID, Gender
FROM 
 (SELECT TOP 1000 UserId, Gender
  FROM Users
  WHERE gender = 1
  ORDER BY UserId DESC) f
ORDER BY Gender, UserID DESC

This does what you want, just change the order by if you'd rather have the latest user first, but it will get you the top 1000 of each.

Chris Diver
+1 You're right, I focussed on the ordering and missed that he wanted the top 1000 per gender
Andomar
Close, but if you order by Gender at the end then you split the males and females apart again.
Tom
@Chris Diver, yep you got it right. Just maybe there is bit faster solution...
eugeneK
@eugeneK I can't see another way to do it in one query, you can create a temporary table, insert the top 1000 of each into that, but I don't think it would be quicker. It'll be interesting if some knows a way though :D
Chris Diver
PS Whoever down voted can you please point out the error in the query, because I can't see one. Thanks.
Chris Diver
@Chris Diver, whoever voted down doesn't matter because your query is the only query out of all answers that did the job correctly. I don't know how and why. Tbh i don't care.
eugeneK
@Chris Diver That's better now without the Gender in the order by at the end. :-)
Tom
@eugeneK - I thought that the answer was wrong in some way that I hadn't seen, glad its not.
Chris Diver
@Chris Diver, Tom is right that Gender at the end is useless to me but i haven't specified that in my question so you answer still right even though i've removed Gender from order by at first.
eugeneK
@Chris. You can use `ROW_NUMBER()` for this. Does that count as a single query solution?
Martin Smith
@Martin it's still a subquery, but it is far better solution.
Chris Diver
+3  A: 

Another way of doing it

WITH TopUsers AS
(
SELECT UserId, 
       Gender,
       ROW_NUMBER() OVER (PARTITION BY Gender ORDER BY UserId DESC) AS RN
  FROM Users
  WHERE Gender IN (0,1) /*I guess this line might well not be needed*/
) 

SELECT UserId, Gender 
FROM TopUsers  
WHERE RN <= 1000
ORDER BY UserId DESC
Martin Smith
A much more elegant solution than mine. Performs better too. I will remember that. Thanks.
Chris Diver
What i will do without you, Martin Smith? I end up with 90% of answers from you!
eugeneK