views:

57

answers:

4

I've started my own thread on this question so as to have less overhead from posting it on someone else's thread. I have trouble understanding in SQL what the difference is between GROUP BY and ORDER BY. I know there have been threads made about this, but they don't provide me with a useful answer. Here is something along the general lines of the same answer I just saw repeated many times in other threads:

ORDER BY sorts rows based on their values in a certain column, whereas GROUP BY groups rows with the same values in a certain column to help with aggregate functions.

To me that just seems to virtually describe the same things in two different wordings. I basically need this paraphrased completely. The problem is that ORDER BY does "group things" the same way that GROUP BY does, possibly with the sole exception that GROUP BY might possibly go without guarantees as to there being some sort of special order. Furthermore I just don't see how GROUP BY would help any more with an aggregate function that ORDER BY would.

Please explain the difference in a different way than most do. Thanks!

+4  A: 

This is probably best illustrated with an example. Consider a person table:

ID  Gender  Name
 1    M     Fred
 2    M     Bob
 3    F     Jane
 4    M     Alex
 5    F     Mary

Run:

SELECT * FROM person ORDER BY gender

and you get:

ID  Gender  Name
 3    F     Jane
 5    F     Mary
 1    M     Fred
 2    M     Bob
 4    M     Alex

But run:

SELECT gender, COUNT(1) FROM person GROUP BY gender

and you get:

Gender  COUNT(1)
  M        3
  F        2

The other part of this you need to understand is the difference between WHERE and HAVING. Consider:

SELECT gender, COUNT(1) FROM person WHERE name = 'Bob' GROUP BY gender

Result:

Gender  COUNT(1)
  M        1

but:

SELECT gender, COUNT(1) FROM person GROUP BY gender HAVING COUNT(1) > 2

result:

Gender  COUNT(1)
  M        3

Basically ORDER BY changes the order of rows (which may then be "piped" into an aggregation) whereas HAVING filters the results of the aggregation. To put it another way, WHERE affects the input to a GROUP BY, HAVING affects the output.

cletus
id think it'll help him more if you did the same call except change order by to group by. by showing two different calls you run the risk of him still not understanding :) but this has been helpful :)
corroded
The key difference between the `WHERE` and `HAVING` clauses is that you can not use aggregate functions (outside of a subquery) in the WHERE - only in the HAVING.
OMG Ponies
What corroded said was basically the case. That was another problem I had learning the difference from other threads. People would provide two very different pieces of code, depending on whether they were showing GROUP BY or ORDER BY. The problem is there's other stuff there making other changes. I just needed to see what the change would be depending on which of the two was used. Thanks anyway.
Panzercrisis
+1  A: 

Instead of:

a   2
a   4
a   2
a   1
a   2

GROUP BY will show ONE "a" followed by some kind of aggregation of the other column (maybe average).

ORDER BY will keep all the a aa a a a, and just order the rows based on one of the columns (so maybe a 1 will be first).

So in the end, GROUP BY will result in one row in this case, while ORDER BY will have the same number of rows as the original.

BobTurbo
+1  A: 

Order by sorts your results -- one row of results per one matching (where clause) row in the database

Group by gives you fewer rows of results than the matching rows (where clause) in your database.

Eg you have a table of students's grade point averages (0-4.0) per teacher

Order by will give you all the students grades for all teachers

Group by teachers will give you one result per teacher. Usually you'd select something like the average of the gpa fields, so you'd get one gpa number per teacher

Larry K
+1  A: 

The order by clause is used to sort the result on a criteria, while the group by clause is used to put records in groups where a criteria is common for the records.

Conside a table Test looking like this:

Name  Number
-----------------
B     1
B     2
C     3
C     4
C     5
A     6
A     7
A     8

If you run this query:

select Name, Number
from Test
order by Name

You end up with a result with all records sorted:

A     6
A     7
A     8
B     1
B     2
C     3
C     4
C     5

When you group records, the result isn't the records but instead the groups. You can use aggregates like for example count, min and max to get characteristics that involve all the records from that group.

If you run this query:

select Name, count(*) as Cnt, min(Number) as Min, max(Number) as Max
from Test
group by Name

You end up with a result containing three groups, as there are three different names:

Name  Cnt  Min  Max
----------------------
B     2    1    2
C     3    3    5
A     3    6    8

Notice that grouping doesn't imply that the result is sorted on the criteria that you group on. If you want that also, you would use both group by and order by, and the result would be sorted after it had been grouped.

Guffa
I left a comment on the other guy's post that kind of applies here as well. Thanks anyway.
Panzercrisis
@Panzercrisis: I showed you what `group by` is used for. If you are just using it as BobTurbo showed, you shouldn't. You should use `distinct` instead.
Guffa