views:

73

answers:

2

Right now, I'm debating whether or not to use COUNT(id) or "count" columns. I heard that InnoDB COUNT is very slow without a WHERE clause because it needs to lock the table and do a full index scan. Is that the same behavior when using a WHERE clause?

For example, if I have a table with 1 million records. Doing a COUNT without a WHERE clause will require looking up 1 million records using an index. Will the query become significantly faster if adding a WHERE clause decreases the number of rows that match the criteria from 1 million to 500,000?

Consider the "Badges" page on SO, would adding a column in the badges table called count and incrementing it whenever a user earned that particular badge be faster than doing a SELECT COUNT(id) FROM user_badges WHERE user_id = 111?

Using MyIASM is not an option because I need the features of InnoDB to maintain data integrity.

+2  A: 

Performance of COUNT() is fine as long as you have an index that's used.

If you have a million records and the column in question is NON NULL then a COUNT() will be a million quite easily. If NULL values are allowed, those aren't indexed so the number of records is easily obtained by looking at the index size.

If you're not specifying a WHERE clause, then the worst case is the primary key index will be used.

If you specify a WHERE clause, just make sure the column(s) are indexed.

cletus
Really? According to http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/ doing COUNT(id) would still require a full table scan, even if there was an index on `id`.
Exception
@Exception that post is from December 2006. Too long ago to be taken on face value.
cletus
@cletus it's an index scan, not a table scan. But nothing has changed since December 2006.The reason for the index scan is because InnoDB has a feature called Multiversion Concurrency Control <http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. The number of rows depends on the context of your transaction.
Morgan Tocker
@Morgan Tocker: [MySQL 5.1 was released on Nov 27th, **2008**](http://en.wikipedia.org/wiki/Mysql)
OMG Ponies
@OMG Ponies: I'm aware, I was working for MySQL/Sun Microsytems in Nov 2008 ;) Nothing in 5.1 (or upcoming 5.5) changes this advice.
Morgan Tocker
+1  A: 

I wouldn't say avoid, but it depends on what you are trying to do:

  • If you only need to provide an estimate, you could do SELECT MAX(id) FROM table. This is much cheaper, since it just needs to read the max value in the index.

  • If we consider the badges example you gave, InnoDB only needs to count up the number of badges that user has (assuming an index on user_id). I'd say in most case that's not going to be more than 10-20, and it's not much harm at all.

It really depends on the situation. I probably would keep the count of the number of badges someone has on the main user table as a column (count_badges_awarded) simply because every time an avatar is shown, so is that number. It saves me having to do 2 queries.

Morgan Tocker
In the badges example I meant the `/badges` page that shows how many people earned a certain badge. Some badges have been awarded to 100,000 people! Counting 100,000 values is something I would not want to do everything someone views my page.
Exception
If you meant counting how many badges have been issued, doing that live is about as much fun as being stabbed in the eye :)You want to keep a summary somewhere on the 'badges' table or in memcached. It's not practical to ask MySQL each time.
Morgan Tocker