ansaurus

Question

ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

Answer 1

+4 A:

While they are not the same thing, in one sense DISTINCT implies a GROUP BY, because every DISTINCT could be re-written using GROUP BY instead. With that in mind, it doesn't make sense to order by something that's not in the aggregate group.

For example, if you have a table like this:

and then try to query it like this:

SELECT DISTINCT col1 FROM [table] WHERE col2 > 2 ORDER BY col1, col2

That would make no sense, because there could end up being multiple col2 values per row. Which one should it use for the order? Of course, in this query you know the results wouldn't be that way, but the database server can't know that in advance.

Now, your case is a little different. You included all the columns from the order by clause in the select clause, and therefore it would seem at first glance that they were all grouped. However, some of those columns were included in a calculated field. When you do that in combination with distinct, the distinct directive can only be applied to the final results of the calculation: it doesn't know anything about the source of the calculation any more.

This means the server doesn't really know it can count on those columns any more. It knows they where used, but it doesn't know if the calculation operation might cause an effect similar to my first simple example above.

So now you need to do something else to tell the server that the columns are okay to use for ordering. There are several ways to do that, but this approach should work okay:

SELECT rsc.RadioServiceCodeId,
            rsc.RadioServiceCode + ' - ' + rsc.RadioService as RadioService
FROM sbi_l_radioservicecodes rsc
INNER JOIN sbi_l_radioservicecodegroups rscg 
    ON rsc.radioservicecodeid = rscg.radioservicecodeid
WHERE rscg.radioservicegroupid IN 
    (SELECT val FROM dbo.fnParseArray(@RadioServiceGroup,','))
    OR @RadioServiceGroup IS NULL  
GROUP BY rsc.RadioServiceCode,rsc.RadioServiceCodeId,rsc.RadioService
ORDER BY rsc.RadioServiceCode,rsc.RadioServiceCodeId,rsc.RadioService

Joel Coehoorn 2008-11-05 16:10:39

If possible, can you elaborate a little more on this?

Xaisoft 2008-11-05 16:17:59

Sure- updated the the post.

Joel Coehoorn 2008-11-05 16:33:38

Distinct just means to eliminate the duplicates, and once you have done that it does make sense to wantto order the distinct rows in the result

Charles Bretana 2008-11-05 17:07:07

I think you missed the point, Charles, but I updated the post to clarify a little.

Joel Coehoorn 2008-11-05 17:10:26

Answer 2

+3 A:

Try this:

ORDER BY 1, 2

OR

ORDER BY rsc.RadioServiceCodeId, rsc.RadioServiceCode + ' - ' + rsc.RadioService

Chris Pebble 2008-11-05 16:11:31

Probably a good idea to expand on the why of this - in particular, the fact that the columns aren't distinctly identified when used in a calculated field.

Mike Burton 2008-11-05 16:13:32

Curious as to what the snytax ORDER BY 1, 2 does.

Xaisoft 2008-11-05 16:18:47

The second one worked fine.

Xaisoft 2008-11-05 16:25:01

If it worked, be sure to mark it as an accepted answer

Tom H. 2008-11-05 16:41:06

Answer 3

A:

The reason why I am doing DISTINCT is because when I inner join with sbi_l_radioservicecodegroups rscg, it returns duplicate rows and I need to only show 1 of each.

Xaisoft 2008-11-05 16:12:23

Answer 4

+3 A:

Try oe of these:

Use column alias:

ORDER BY RadioServiceCodeId,RadioService
Use column position:

ORDER BY 1,2

You can only order by columns that actually appear in the result of the DISTINCT query - the underlying data isn't available for ordering on.

Tony Andrews 2008-11-05 16:13:07

Answer 5

A:

Chris Pebble's answer worked as did Joel Coehoorn. Which one is actually better? Do Group By clauses slow down queries? How can I pick an answer if more than one works.

With Chris's answer, I avoid the Group By Clause in Joel's answer, but in Joel's answer, I avoid the DISTINCT in the query, but add the group clause. What is the better trade off?

Thanks again.

Xaisoft 2008-11-05 16:26:37

Probably the GROUP BY is faster, because it allows the grouping and ordering to take place using the original fields, rather than the calculated field. That could be huge if there's an index involved. (more coming)

Joel Coehoorn 2008-11-05 16:37:28

However, there might be some special optimizations it can do only for DISTINCT, and you never really know until you test on a real-world system.

Joel Coehoorn 2008-11-05 16:38:34

There is also clarity to consider: in terms of preserving your meaning, the DISTINCT clause seems the better choice.

Mike Burton 2008-11-05 17:13:55

I don't know about MSSQL, but in Oracle DISTINCT and GROUP BY use the same execution plan. It's just syntactic sugar.

Mr. Shiny and New 2009-02-04 20:57:20

Answer 6

+1 A:

Distinct and Group By generally do the same kind of thing, for different purposes... They both create a 'working" table in memory based on the columns being Grouped on, (or selected in the Select Distinct clause) - and then populate that working table as the query reads data, adding a new "row" only when the values indicate the need to do so...

The only difference is that in the Group By there are additional "columns" in the working table for any calculated aggregate fields, like Sum(), Count(), Avg(), etc. that need to updated for each original row read. Distinct doesn't have to do this... In the special case where you Group By only to get distinct values, (And there are no aggregate columns in output), then it is probaly exactly the same query plan.... It would be interesting to review the query execution plan for the two options and see what it did...

Certainly Distinct is the way to go for readability if that is what you are doing (When your purpose is to eliminate duplicate rows, and you are not calculating any aggregate columns)

Charles Bretana 2008-11-05 17:15:16

ansaurus

tags:

views:

answers:

ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

related questions