views:

413

answers:

7

I have a page where I have 4 tabs displaying 4 different reports based off different tables.

I obtain the row count of each table using a select count(*) from <table> query and display number of rows available in each table on the tabs. As a result, each page postback causes 5 count(*) queries to be executed (4 to get counts and 1 for pagination) and 1 query for getting the report content.

Now my question is: are count(*) queries really expensive -- should I keep the row counts (at least those that are displayed on the tab) in the view state of page instead of querying multiple times?

How expensive are COUNT(*) queries ?

+6  A: 

In general, the cost of COUNT(*) cost is proportional to the number of records satisfying the query conditions plus the time required to prepare these records (which depends on the underlying query complexity).

In simple cases where you're dealing with a single table, there are often specific optimisations in place to make such an operation cheap. For example, doing COUNT(*) without WHERE conditions from a single MyISAM table in MySQL - this is instantaneous as it is stored in metadata.

For example, Let's consider two queries:

SELECT  COUNT(*)
FROM    largeTableA a

Since every record satisfies the query, the COUNT(*) cost is proportional to the number of records in the table (i.e., proportional to what it returns) (Assuming it needs to visit the rows and there isnt a specific optimisation in place to handle it)

SELECT  COUNT(*)
FROM    largeTableA a
JOIN    largeTableB b
ON      a.id = b.id

In this case, the engine will most probably use HASH JOIN and the execution plan will be something like this:

  1. Build a hash table on the smaller of the tables
  2. Scan the larger table, looking up each records in a hash table
  3. Count the matches as they go.

In this case, the COUNT(*) overhead (step 3) will be negligible and the query time will be completely defined by steps 1 and 2, that is building the hash table and looking it up. For such a query, the time will be O(a + b): it does not really depend on the number of matches.

However, if there are indexes on both a.id and b.id, the MERGE JOIN may be chosen and the COUNT(*) time will be proportional to the number of matches again, since an index seek will be performed after each match.

Quassnoi
Surely what was meant was "COUNT(\*) cost is proportional to number or records plus the time required to contact the database."? Doesn't mean the COUNT(*) operation couldn't be proportional to the total number of records, does it?
Alison
@Alison: see the post update.
Quassnoi
@Quassnoi: my comment doesn't make so much sense now - it was a reply to an earlier comment.
Alison
A: 

COUNT(*) can be particularly expensive as it may result in loading (and paging) an entire table, where you may only need a count on a primary key (In some implementations it is optimised).

From the sound of it, you are causing a table load operation each time, which is slow, but unless it is running noticeably slowly, or causing some sort of problem, don't optimise: premature and unnecessary optimisation can cause a great deal of trouble!

A count on an indexed primary key will be much faster, but with the costs of having an index this may provide no benefit.

penguat
This is not true for SQL Server (T-SQL) - count(*) is optimised and is the preferred method for counting every row.
Dan Diplo
-1 You need to provide references before speaking with such authority and no provisos
Ruben Bartelink
A reference: http://thehobt.blogspot.com/2008/12/debunking-myth-select-count-vs-select.html
Dan Diplo
@Dan Diplo: Sorry if you felt my request for a citation was aimed at you (I upvoted your comment). Good citation.
Ruben Bartelink
I wasn't sure if it was, but thought it a good idea anyway.
Dan Diplo
+6  A: 

You need to attach SQL Profiler or an app level profiler like L2SProf and look at the real query costs in your context before:

  • guessing what the problem is and trying to determine the likely benefits of a potential solution

  • allowing others to guess for you on da interwebs - there's lots of misinformation without citations about, including in this thread (but not in this post :P)

When you've done that, it'll be clear what the best approach is - i.e., whether the SELECT COUNT is dominating things or not, etc.

And having done that, you'll also know whether any changes you choose to do have had a positive or a negative impact.

Ruben Bartelink
Thanks for the accept! Care to briefly tell us what you decided to do in the end and/or discovered on the way?
Ruben Bartelink
A: 

All I/O is expensive and if you can accomplish the task without it, you should. But if it's needed, I wouldn't worry about it.

You mention storing the counts in view state, certainly an option, as long as the behavior of the code is acceptable when that count is wrong because the underlying records are gone or have been added to.

SqlACID
+2  A: 

As others have said COUNT(*) always physically counts rows, so if you can do that once and cache the results, thats certainly preferable.

If you benchmark and determine that the cost is negligible, you don't (currently) have a problem.

If it turns out to be too expensive for your scenario you could make your pagination 'fuzzy' as in "Showing 1 to 500 of approx 30,000" by using

SELECT rows FROM sysindexes WHERE id = OBJECT_ID('sometable') AND indid < 2

which will return an approximation of the number of rows (its approximate because its not updated until a CHECKPOINT).

Alex K.
A: 

This depends on what are you doing with data in this table. If they are changing very often and you need them all every time, maybe you could make trigger which will fill another table that consists only on counts from this table. If you need to show this data separately, maybe you could just execute "select count(*)..." for only one particular table. This just came to my mind instantly, but there are other ways to speed this up, I'm sure. Cache data, maybe? :)

Deveti Putnik
+1  A: 

If the page gets slow, one thing you can look at is minimizing the number of database roundtrips, if at all possible. Even if your COUNT(*) queries are O(1), if you're doing enough of them, that could certainly slow things down.

Instead of setting up and executing 5 separate queries one at a time, run the SELECT statements in a single batch and process the 5 results at once.

I.e., if you're using ADO.NET, do something like this (error checking omitted for brevity; non-looped/non-dynamic for clarity):

string sql = "SELECT COUNT(*) FROM Table1; SELECT COUNT(*) FROM Table2;"

SqlCommand cmd = new SqlCommand(sql, connection);
SqlDataReader dr = cmd.ExecuteReader();

// Defaults to first result set
dr.Read();
int table1Count = (int)dr[0];

// Move to second result set
dr.NextResult();
dr.Read();
int table2Count = (int)dr[0];

If you're using an ORM of some sort, such as NHibernate, there should be a way to enable automatic query batching.

Jon Seigel