tags:

views:

2537

answers:

5

I have the following code:

SELECT <column>, count(*)
FROM <table>
GROUP BY <column> HAVING COUNT(*) > 1;

Is there any difference to the results or performance if I replace the COUNT(*) with COUNT('x')?

(This question is related to a previous one)

+5  A: 

I believe this one has been answered in: http://stackoverflow.com/questions/59294/in-sql-whats-the-difference-between-countcolumn-and-count

Lars A. Brekken
That's very similar (and may indeed be the same answer), but I wondered if there is a difference between referencing a specific column (i.e. COUNT(column)) compared to referencing an arbitrary string (i.e. COUNT('x')).
Andrew
+3  A: 

The major performance difference is that COUNT(*) can be satisfied by examining the primary key on the table.

i.e. in the simple case below, the query will return immediately, without needing to examine any rows.

select count(*) from table

I'm not sure if the query optimizer in SQL Server will do so, but in the example above, if the column you are grouping on has an index the server should be able to satisfy the query without hitting the actual table at all.

To clarify: this answer refers specifically to SQL Server. I don't know how other DBMS products handle this.

Brannon
+2  A: 

This question is slightly different that the other referenced. In the referenced question, it was asked what the difference was when using count(*) and count(SomeColumnName), and SQLMenace's answer was spot on.

To address this question, essentially there is no difference in the result. Both count(*) and count('x') and say count(1) will return the same number. The difference is that when using " * " just like in a SELECT all columns are returned, then counted. When a constant is used (e.g. 'x' or 1) then a row with one column is returned and then counted. The performance difference would be seen when " * " returns many columns.

Update: The above statement about performance is probably not quite right as discussed in other answers, but does apply to subselect queries when using EXISTS and NOT EXISTS

NateSchneider
Does that mean COUNT('x') would be faster if the table had many columns, compared to COUNT(*)?
Andrew
I think this behavior depends on the database and the query optimization applied. It's an obvious optimization to perform when you see COUNT(*). It can only mean one thing, you want the total count of rows, regardless of how many columns the table has.
Brannon
+7  A: 
Matt Rogish
It's not correct to say there's not a difference between select(n) and select(*). If you have a covering index that includes n, you get the data straight from the leaf level of the index and don't have to go back to the table, which is much faster.
Eric Z Beard
The DBMS optimizer *will* realize this, and choose the correct index for the job. Provided there is an index, rare is the day that I've seen a DBMS actually **count** rows on the table. Moreover, the presence of NULLs often cause semantic bugs. When you want the # of rows in a table, use COUNT(*)!!!
Matt Rogish
@Matt Just a note on the tone of your answer... If you want to get excited about someone else's apparent ignorance, the appropriate place might be the "comments", rather than your own answer. Lacing your answer with slights at others is most decidedly "not helpful".
Chris Ammerman
TI: I disagree: if someone is incorrect and has upvotes, I find it unlikely that comments will 1) spur upvoters to change their votes or 2) that potential upvoters will read the comments before voting. The "comments (n)" link is too easily overlooked.
Matt Rogish
@Matt I was thinking more of using comments to tell the answerer that their answer was poorly informed, so they might fix it, rather than to sway other voters. Furthermore, there's a reason downvoting isn't as impactful as upvoting: to encourage spotlighting good answers over burying bad ones.
Chris Ammerman
@Matt To put it simply, if your answer is a good one, it will continue to gain upvotes and hence prominence on the page, which will in turn push the bad ones out of prominence, and pinch off any continuing erroneous upvotes they might have gotten otherwise. Harsh language is completely unnecessary.
Chris Ammerman
@TI: I agree that commenting is perfectly suited to nudge answers in the right direction. A wrong, upvoted answer that shows no evidence of any investigation ought to be called out. We want this site to be the arbiter of the "correct" answer. Wouldn't an upvoted, wrong answer be exactly opposite?
Matt Rogish
@Matt Surely it would if it somehow managed to get to the top of the page. But if it's already at the bottom, or if there haven't been many answers yet, beating the bad answer into submission seems a prematurely strong reaction, unlikely to be necessary in the end.
Chris Ammerman
@Matt I guess the crux of my point is that the disparity in score between the good answer and the bad is the real indicator of quality. Not whether the score is positive or negative. And I would prefer, for myself, to use the goodwill of upvotes for good answers to create that disparity. YMMV.
Chris Ammerman
@TI: I understand and can see how you'd feel it's "[unnecessarily] harsh". I don't fully agree (it's a matter of style, I think) but I do agree with your assessment of the situation. I'll avoid pulling that trigger so quickly. Hopefully never! :) Thanks for your feedback.
Matt Rogish
Chris Ammerman
@Matt The other thing that hit me was a concern over the future relevance of remarks on other people's answers, in an environment where those answers can be deleted.
Chris Ammerman
@TI: The next /.? THE HORROR!!! :)
Matt Rogish
@TI: Indeed, as one of my comments was to a posting that now no longer exists. :(
Matt Rogish
It's unfortunate that users like TI want to turn this into Gym Class where A's are awarded for participation, not actual skill. Sadly, our generation did this to him. Told him that the answers to tests were less important than how he FELT about them.
+1  A: 

MySQL: According to the MySQL website, COUNT(*) is faster for single table queries when using MyISAM:

http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_count

I'm guessing with a having clause with a count in it may change things.

Darryl Hein