views:

233

answers:

6

I have a table emp with following structure and data:

name   dept    salary
-----  -----   -----
jack   a       2
jill   a       1
tom    b       2
fred   b       1

When I execute the following SQL:

SELECT * FROM emp GROUP BY dept

I get the following result:

name   dept    salary
-----  -----   -----
jill   a       1
fred   b       1

On what basis did the server decide return jill and fred and exclude jack and tom?

I am running this query in MySQL.

Note 1: I know the query doesn't make sense on its own. I am trying to debug a problem with a 'GROUP BY' scenario. I need to understand the default behavior for this purpose.

Note 2: I am used to writing the SELECT clause same as the GROUP BY clause (minus the aggregate fields). When I came across this behavior described above, I started wondering if I can rely on this for scenarios such as: select the rows from emp table where the salary is the lowest/highest in the dept. E.g.: The SQL statements like this works on MySQL:

SELECT A.*, MIN(A.salary) AS min_salary FROM emp AS A GROUP BY A.dept

I didn't find any material describing why such SQL works, more importantly if I can rely on such behavior consistently. If this is a reliable behavior then I can avoid queries like:

SELECT A.* FROM emp AS A WHERE A.salary = ( 
            SELECT MAX(B.salary) FROM emp B WHERE B.dept = A.dept)
A: 

Try using ORDER BY to pick the row that you want.

SELECT * FROM emp GROUP BY dept ORDER BY name ASC;

Will return the following:

name   dept    salary
-----  -----   -----
jack   a       2
fred   b       1
Marius
In my case ORDER BY makes no difference. I expected this behavior as ORDER BY is applied after the GROUP BY.
KandadaBoggu
+1  A: 

As far as I know, for your purposes the specific rows returned can be concidered to be random.

Ordering only takes place after GROUP BY is done

Joel L
A: 

If you are grouping by department does it matter about the other data? I know Sql Server will not even allow this query. If there is a possibility of this sounds like there might be other issues.

CSharpAtl
I know this SQL is not valid in Oracle and few other databases.
KandadaBoggu
+3  A: 
mjv
I was going to post the exact same...
OMG Ponies
A: 

I find that the best thing to do is to consider this type of query unsupported. In most other database systems, you can't include columns that aren't either in the GROUP BY clause or in an aggregate function in the HAVING, SELECT or ORDER BY clauses.

Instead, consider that your query reads:

SELECT ANY(name), dept, ANY(salary)
FROM emp 
GROUP BY dept;

...since this is what's going on.

Hope this helps....

Rob Farley
A: 

I think ANSI SQL requires that the select includes only fields from the GROUP BY clause, plus aggregate functions. This behaviour of MySQL looks like returns some row, possibly the last one the server read, or any row it had at hand, but don't rely on that.

Petruza
About Marius comment: (I can't comment on it due to low scoring) As others said, Order By acts on the result of the Grouping, there is no point in sorting rows that will be collapsed by a grouping. Instead, you could select MAX( name ) which would actually return the last name if the rows were ordered alfabetically ascending.
Petruza