views:

30

answers:

2

SQLite behaves differently when dealing with aggregation than many other RDBMS's. Consider the following table and values:

create table foo (a int, b int);
insert into foo (a, b) values (1, 10);
insert into foo (a, b) values (2, 11);
insert into foo (a, b) values (3, 12);

If I query it thus:

select a, group_concat(b) from foo;

Normally, I would expect to receive an error, due to the fact that I haven't included column 'a' in a GROUP BY clause. Below is the error produced by SQL Server (PostgreSQL would emit something similar).

Column 'foo.a' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

SQLite, on the other hand, just goes along with it and produces this result:

3|10,11,12

What good is this? How did it pick the value for column 'a'? If we add another row, there seems to be a pattern in what it's picking, and perhaps we can tentatively say that it's using the most recently added row, although it could simply be indeterminate.

sqlite> insert into foo (a, b) values (2, 13);
sqlite> select a, group_concat(b) from foo;
2|10,11,12,13

This seems like a bug to me, but I'm wondering what our database experts here have to say about it.

(I'm using SQLite version 3.6.16 on Ubuntu.)

+1  A: 

This is useful behavior in cases where you're selecting multiple grouped columns but you only need the query engine to truly test one column for the grouping. Take this for example:

Given an Orders and OrderDetails table.

SELECT O.OrderID, O.OrderDate, SUM(OD.Price * OD.Quantity) TotalPrice
FROM Orders O NATURAL JOIN OrderDetails OD
GROUP BY O.OrderID

In other databases, we would need to include both OrderID and OrderDate in the group by. The database would then group by both columns, which is redundant in this case. By grouping only on OrderID, we get the same results with more efficiency and less code.

Sam
Your example makes sense, and it's sort of a nice feature in that case. But.... you also include a GROUP BY. I'm referring to the case where that clause is missing completely.
Alison R.
Remember the SQLite philosophy is very different from other databases. It allows tons of stuff that it doesn't understand and other databases throw errors on. For example, you can take just about any DDL script from any DB, run it on SQLite, and not get an error. It's very unlikely to create a DB you want, but it will run without error and create a DB. SQLite is definitely something you want to be familiar with when using, not just assume it works like all other databases.
Sam
A: 

MySQL also returns an indeterminate value for a from among the matched rows, usually the first one. SQL Server and PostgreSQL are just guarding against laziness by forcing you to make a decision to disambiguate each column.

eswald
Hm. I wish I knew more about the logic of how and why it decides to just "pick one" in that case.
Alison R.
See also: http://stackoverflow.com/questions/1023347/mysql-selecting-a-column-not-in-group-by
eswald