tags:

views:

68

answers:

3

I am trying to think data in terms of sets but have some questions about aggregate functions.

here is the definition from wiki

an aggregate function is a function that returns a single value from a collection of input values such as a set

so for example,

select c.id, c.user_id, c.name, c.created_at, count(c.id) from collections c;

can be think of " count returns a single value from collection c set"

select c.id, c.user_id, c.name, c.created_at, count(c.id) 
from collections c group by c.user_id

can be think of "count returns a single value from each subset(set from group by) of a collection(c) set"

the question i have is, how do i know which 'single value' the count returns from, in this case, collection(c) set or each 'group by' subset.

Consider a sightly more complicated query(TOP N PER GROUP)

select c.id, c.user_id, c.name, c.created_at 
from collections c 
left join collections co on c.user_id = co.user_id and c.name <=co.name 
group by c.user_id, c.name 
having count(*)<=2;

here sets group by(c.user_id) has its own subsets (c.name), and how do i know what count(*) is going to return(a single value of the entire set(which will be just one rol)? or a single value of each subset(c.user_id) or a single value of each subset(c.name)?)

+1  A: 

For any of your queries that perform aggregation to work, you need to group by the correct fields.

The first query should fail because the c.id, c.user_id, c.name, and c.created_at fields are not grouped using GROUP BY.

Similarly, the second query will fail as well because only the first field is grouped.

To get the last query to work, you might need to include the id in GROUP BY as well.

Aggregate functions only work when all of the non-aggregate elements of your SELECT clause (e.g., c.id, c.user_id, etc) represent the group being aggregated (i.e., are included in the GROUP BY clause).

David Andres
I'm pretty sure MySQL is lax about enforcing that requirement, and will actually execute queries 2 and 3.
derobert
good to know, but I thought it stayed close enough to SQL standard on this one
David Andres
thanks. query 1, 2, and 3 will execute in mysql. I tried these queries before posting. mysql does not enforce any of these requirements, though only makes sense when being aggregated.
+1  A: 

GROUP BY doesn't make multiple sets. It makes one; in your case, its grouping by the pair (c.user_id, c.name). Any rows with the same (c.user_id, c.name) are put together in a group, and those are the groups count(*) will be working on.

derobert
A: 

Aggregate functions are computed after a) Joins b) filtering rows based on the where clause.

Picturize your data set once the join is completed and the rows have been filtered based on the where clause in your query. The group by clause will now subdivide your data set into distinct groups based on the columns specified in your where clause. All the rows in a given group will have the same value for all the columns specified in your where clause.

Once the rows in the original data set have been classified into groups, you can only query for (ie select the columns) that are common to a group. In your second example where you have grouped by c.user_id, there will be one group for each distinct user_id in your collections table. If you do not have a 'HAVING' clause in your query, your query will return one row per group. You can think (only think) of each group as a row containing columns. These columns would represent things that are common to the entire group like count(*), SUM, MAX, MIN etc.. The column name on which the groups are formed is also the same for each group!

The having clause is a like a where clause for groups. It filters out entire groups that do not satisfy the predicate in the having clause.

bkm