tags:

views:

676

answers:

2

Surprise -- this is a perfectly valid query in MySql:

select X, Y from someTable group by X

If you tried this query in Oracle or SQL Server, you’d get the natural error message:

Column 'Y' is invalid in the select list because it is not contained in 
either an aggregate function or the GROUP BY clause.

So how does MySql determine which Y to show for each X? It just picks one. From what I can tell, it just picks the first Y it finds. The rationale being, if Y is neither an aggregate function nor in the group by clause, then specifying “select Y” in your query makes no sense to begin with. Therefore, I as the database engine will return whatever I want, and you’ll like it.

There’s even a MySql configuration parameter to turn off this “looseness”. http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html#sqlmode_only_full_group_by

This article even mentions how MySql has been criticized for being ANSI-SQL non-compliant in this regard. http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html

My question is: Why was MySql designed this way? What was their rationale for breaking with ANSI-SQL?

+5  A: 

I believe that it was to handle the case where grouping by one field would imply other fields are also being grouped:

SELECT user.id, user.name, COUNT(post.*) AS posts 
FROM user 
  LEFT OUTER JOIN post ON post.owner_id=user.id 
GROUP BY user.id

In this case the user.name will always be unique per user.id, so there is convenience in not requiring the user.name in the GROUP BY clause (although, as you say, there is definite scope for problems)

Cebjyre
So it was just to save some typing (as in keyboard typing)? Heh.
Aaron F.
Less columns in GROUP BY clause means faster execution time so it's an optimizing hack. I'm constrantly using `MAX(user.name) AS name` in similar queries on ANSI SQL implementations.
wqw
+1  A: 

Unfortunately almost all the SQL varieties have situations where they break ANSI and have unpredictable results.

It sounds to me like they intended it to be treated like the "FIRST(Y)" function that many other systems have.

More than likely, this construct is something that the MySQL team regret, but don't want to stop supporting because of the number of applications that would break.

Rob

Rob Farley