tags:

views:

4560

answers:

5

I have the following query:

select column_name, count(column_name)
from table
group by column_name
having count(column_name) > 1;

What would be the difference if I replaced all calls to count(column_name) to count(*)?

This question was inspired by a previous one.


Edit:

To clarify the accepted answer (and maybe my question), using count(*) in this case returns an extra row in the result that contains a null and the count of null values in the column.

+73  A: 

count(*) counts NULLs and count(column) does not

[edit] added this code so that people can run it

create table #bla(id int,id2 int)
insert #bla values(null,null)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,null)

select count(*),count(id),count(id2)
from #bla

results 7 3 2

SQLMenace
Just curious: if you have a row with _all_ NULLs, would count(*) still count it, or is just count(column) for all columns?
Joel Coehoorn
it would count it
SQLMenace
Is this standard accross DBMSs?
Eclipse
DB2 v9 is issuing a warning:SQLSTATE 01003: Null values were eliminated from the argument of a column function
Boune
It's worth mentioning that if you have a non-nullable column such as ID, then count(ID) will significantly improve performance over count(*).
tsilb
@tsilb: The answer posted by @Alan states "count(*) is computed by looking at the indexes on the table in question rather than the actual data rows" which, if true, invalidates your comment. I appreciate that @Alan may be wrong but I'm interested in the source of your information in order to find out which is correct.
Tony
@tsilb: Many modern query optimizers will optimize count(*) to use indexes when it makes sense to.
Shannon Severance
+6  A: 

As explaned in the help file:

COUNT(*) returns the number of items in a group, including NULL values and duplicates.

COUNT(expression) evaluates expression for each row in a group and returns the number of nonnull values.

EDIT - This was from Books Online, if you don't have it installed you can find it here.

Cookey
For SQL newbs: To what help file are you referring?
Bill the Lizard
+5  A: 

Another minor difference, between using * and a specific column, is that in the column case you can add the keyword DISTINCT, and restrict the count to distinct values:

select column_a, count(distinct column_b)
from table
group by column_a
having count(distinct column_b) > 1;
Brannon
Should the group by column and the one being counted be different? otherwise you would get nothing from this query
steevc
Yes, sorry.. I hadn't noticed that they were the same column in the example. I'll update the post.
Brannon
+3  A: 

A further and perhaps subtle difference is that in some database implementations the count(*) is computed by looking at the indexes on the table in question rather than the actual data rows. Since no specific column is specified, there is no need to bother with the actual rows and their values (as there would be if you counted a specific column). Allowing the database to use the index data can be significantly faster than making it count "real" rows.

Alan
A: 

We can use the Stack Exchange Data Explorer to illustrate the difference with a simple query. The Users table in Stack Overflow's database has columns that are often left blank, like the user's Website URL.

-- count(column_name) vs. count(*)
-- Illustrates the difference between counting a column
-- that can hold null values, a  'not null' column, and  count(*)

select count(WebsiteUrl), count(Id), count(*) from Users

If you run the query above in the Data Explorer, you'll see that the count is the same for count(Id) and count(*)because the Id column doesn't allow null values. The WebsiteUrl count is much lower, though, because that column allows null.

Bill the Lizard