tags:

views:

254

answers:

2

Hi guys, thanks in advance for any help on this topic!

I'm sure this has a very simple answer, but I can't seem to find it (not sure what to search on!). A standard count / group by query may look like this:

SELECT COUNT(`t2`.`name`) 
FROM `table_1` `t1` 
    LEFT JOIN `table_2` `t2` ON `t1`.`key_id` = `t2`.`key_id` 
WHERE `t1`.`another_column` = 123 

and this works as expected, returning 0 if no rows are found. However:

SELECT COUNT(`t2`.`name`) 
FROM `table_1` `t1` 
    LEFT JOIN `table_2` `t2` ON `t1`.`key_id` = `t2`.`key_id` 
WHERE `t1`.`another_column` = 123 
GROUP BY `t1`.`any_col` 

only works if there is at least one row in table_1 and fails miserably returning an empty result set if there are zero rows. I would really like this to return 0! Anyone enlighten me on this? Beer can be provided in exchange if you are in London ;-)

+1  A: 

The reason it returns zero rows is that you are grouping on a value in table_1. SInce there are no values in table_1, there are no rows to return. Said another way, if you returned t1.any_col in your query from the GROUP BY like so:

SELECT `t1`.`any_col`, COUNT(`t2`.`name`) 
FROM `table_1` `t1` 
    LEFT JOIN `table_2` `t2` ON `t1`.`key_id` = `t2`.`key_id` 
WHERE `t1`.`another_column` = 123 
GROUP BY `t1`.`any_col` 

What would display for t1.any_col when there were no rows? The only way to achieve what you want is to union your results with another query that checks for no rows in table_1. In this example, I'm using the INFORMATION_SCHEMA view simply to have something against which I can query.

SELECT COUNT(`t2`.`name`) 
FROM `table_1` `t1` 
    LEFT JOIN `table_2` `t2` ON `t1`.`key_id` = `t2`.`key_id` 
WHERE `t1`.`another_column` = 123 
GROUP BY `t1`.`any_col` 
UNION ALL
SELECT 0
FROM INFORMATION_SCHEMA.TABLES
Where Not Exists( Select 1 From `table_1` )
LIMIT 1
Thomas
Why does the first example work nicely then? All that is missing is the WHERE clause, there can still be no rows and this will work just fine...
Paul Norman
@Paul Norman - The first example works because clearly there are rows in the table_1. The third example does not work because apparently there are no rows in table_1 that have a value 123 for `another_column`. If the query without the grouping returns no rows, then it will not return rows with the grouping.
Thomas
Thanks for the explanation, you are indeed correct, the table had another row in it. This works correctly and with acceptable overhead (only running once a month anyway!). So, I ended up with something like:SELECT COUNT(`t2`.`name`) FROM `table_1` `t1` LEFT JOIN `table_2` `t2` ON `t1`.`key_id` = `t2`.`key_id` WHERE `t1`.`another_column` = 123 GROUP BY `t1`.`any_col`UNION ALLSELECT 0FROM `table_1` WHERE NOT EXISTS(SELECT 1 FROM `table_1` WHERE `another_column` = 123)LIMIT 1
Paul Norman
+2  A: 

Here's the better solution:

SELECT COUNT(*) FROM
(SELECT `t2`.`name`
FROM `table_1` `t1` 
    LEFT JOIN `table_2` `t2` ON `t1`.`key_id` = `t2`.`key_id` 
WHERE `t1`.`another_column` = 123 
GROUP BY `t1`.`any_col`) tmp

You just need to wrap SELECT COUNT(*) around the query.

The problem is that COUNT() changes from meaning "give me a count of the rows in the result set" to an aggregate function meaning "give me a count of the rows for each group that match my criteria" when used with the GROUP BY clause. In your case, since there are no rows for the group where another_column = 123, nothing is returned.

If you wrap that in another SELECT COUNT(*), you're again saying, "give me a count of the rows in the result set".

In the first query, COUNT() is working with GROUP BY. But with no criteria specified, you're getting at least one row returned from table_1. Since there are no corresponding table_2 records, NULL evaluates to 0 for COUNT(). Don't use the first query. Though it returns what you're expecting, it's only by chance. For the query to be meaningful, you should specify criteria for t1.another_column in the WHERE clause.

The second query works because you're asking for a row count of the result set. Since there are no rows where t1.another_column = 123, there are 0 rows returned, and COUNT() evaluates to 0.

Marcus Adams
This sadly only tells me how many unique 'groups' there are. So if there is only one distinct `any_col` value in `table_1`, but 5 rows of data where `any_col` = 5 this would return 1, not 5 as required (because the nested SELECT would only return one row).
Paul Norman
@Paul, please refresh your browser. I had fixed this. Notice that COUNT() is only used in the outer query.
Marcus Adams
I did make that change myself, but I still think (or rather I have tested) that GROUP BY in the nested query means that only one row will be returned if the only data in the table has a single `any_col` value (i.e. if every row has `any_col` = 5, then only one row will be returned by the nested select - as you would expect - and then it follows that counting this data then only produces 1, not the required 5). The solution above works a treat anyway, never thought of that! Thanks for all the help.It's hard to be clear in this tiny box!
Paul Norman