ansaurus

Question

JOIN on another table after GROUP BY and COUNT

Answer 1

A:

Your query would only work on MySQL, because you group on Person.cityKey but select city.key. All other databases would require you to use an aggregate like min(city.key), or to add City.key to the group by clause.

Because the combination of city name and city key is unique, the following are equivalent:

select    count(person.key), min(city.key), min(city.name)
...
group by  person.citykey

Or:

select    count(person.key), city.key, city.name
...
group by  person.citykey, city.key, city.name

Or:

select    count(person.key), city.key, max(city.name)
...
group by  city.key

All rows in the group will have the same city name and key, so it doesn't matter if you use the max or min aggregate.

P.S. If you'd like to count only different persons, even if they have multiple rows, try:

count(DISTINCT person.key)

instead of

count(person.key)

Andomar 2010-06-20 16:53:11

I think that may be a typo in the question; I believe that he intended his "group by" to include both the non-aggregated columns in his "SELECT" clause. I could be wrong however.

Pointy 2010-06-20 16:58:10

For your second example, you have a bunch of things in the `group by` part; from the page I linked in my post, they were saying that that was the wrong thing to do. However, the `MIN` part works; (I thought that only worked for `int` type columns.)

Craig S 2010-06-20 16:59:49

@Craig S: If I read the post you linked to correctly, they're just saying you should `group by` on columns you select. My second example does that correctly. The post also refers to the "annoying error" you get if you do it incorrectly. MySQL is the only database that would accept the "incorrect" way without an "annoying error" :)

Andomar 2010-06-20 17:03:33

Answer 2

+1 A:

I don't think that it's "wrong" in this case, because you've got a one-to-one relationship between city name and city key. You could rewrite it such that you join to a sub-select to get the count of persons to cities by key, to the city table again for the name, but it's debatable that that'd be better. It's a matter of style and opinion I guess.

select PC.ct, City.key, City.name
  from City
  join (select count(Person.key) ct, cityKey key from Person group by cityKey) PC
    on City.key = PC.key

if my SQL isn't too rusty :-)

Pointy 2010-06-20 16:56:12

Answer 3

A:

However, I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.

It's not wrong. You have to understand how the Query Optimizer sees your query. The order in which it is parsed is what requires you to "throw the last part in." The optimizer sees your query in something akin to this order:

the required tables are joined
the composite dataset is filtered through the WHERE clause
the remaining rows are chopped into groups by the GROUP BY clause, and aggregated
they are then filtered again, through the HAVING clause
finally operated on, by SELECT / ORDER BY, UPDATE or DELETE.

The point here is that it's not that the GROUP BY has to name all the columns in the SELECT, but in fact it is the opposite - the SELECT cannot include any columns not already in the GROUP BY.

ahsteele 2010-06-20 17:00:02

Except on MySQL :)

Andomar 2010-06-20 17:04:13

Is it better to use an aggregate function than to add something to the group by?

Craig S 2010-06-20 17:05:24

@Andomar very true but I thought we were talking about SQL in general here ;)

ahsteele 2010-06-20 17:08:12

@Craig that depends on your need. If you don't need to perform a function on the data then no you shouldn't be putting it into a function and should be grouping on it. The only reason you get the either an aggregate function or the GROUP BY clause error is because the optimizer needs to know what you are doing and it has to be either perform a function on this column or group by it. You shouldn't use a function just to avoid the group by.

ahsteele 2010-06-20 17:11:30

Answer 4

+1 A:

...I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.

You misunderstand, you got it backwards.
Standard SQL requires you to specify in the GROUP BY all the columns mentioned in the SELECT that are not wrapped in aggregate functions. If you don't want certain columns in the GROUP BY, wrap them in aggregate functions. Depending on the database, you could use the analytic/windowing function OVER...

However, MySQL and SQLite provide the "feature" where you can omit these columns from the group by - which leads to no end of "why doesn't this port from MySQL to fill_in_the_blank database?!" Stackoverflow and numerous other sites & forums.

OMG Ponies 2010-06-20 17:00:39

Is adding `City.name` to the `Group By` "better" (i.e. faster, more portable, etc) than using `Min(city.name)` as suggested by Andomar?

Craig S 2010-06-20 17:03:54

@Craig S: Depends on your needs. But using an aggregate function for the sake of not having to do it in the GROUP BY is risky -- it'd require more testing to ensure things are correct. I don't see the point of using an aggregate solely for the sake of not having to define it in the GROUP BY.

OMG Ponies 2010-06-20 17:08:19

@OMG Ponies: Okay; I guess after reading that article, I was worried that I was just "adding the group by to make it work" and that in doing so I was making a big SQL mistake. I'm operating on a relatively huge database (if it were the tables in my example, there would be millions of people, and hundreds of cities)

Craig S 2010-06-20 17:13:24

ansaurus

tags:

views:

answers:

JOIN on another table after GROUP BY and COUNT

related questions