views:

166

answers:

7

I have a SQL statement to select results from a table. I need to know the total number of records found, and then list a sub-set of them (pagination).

Normally, I would make 2 SQL calls:

  1. one for counting the total number of records (using COUNT),
  2. the other for returning the sub-set (using LIMIT).

But, this way, you are really duplicating the same operation on MySQL: the WHERE statements are the same in both calls.

Isn't there a way to gain speed NOT duplicating the select on MySQL ?

+2  A: 

That first query is going to result in data being pulled into the cache, so presumable the second query should be fast. I wouldn't be too worried about this.

psychotik
MySQL built-in cache?
sopppas
Right - make your where clause as complete as you can and then don't worry about doing that second query. Since it is immediately after the first, results will come from the cache like I said.
psychotik
A: 

You should just run the COUNT a single time and then cache it somewhere. Then you can just run the pagination query as needed.

NickLarsen
no comment for downvote?
NickLarsen
wasn't me. ;) But, your solution is exactly what I do: two SQL queries with the same WHERE clause.
sopppas
I read your question as I am running 2 queries every time someone wants to view a new page. My suggestion is to run the query which grabs the count a single time and just cache the result in a variable, then run the pagination query for each new set of values requested.
NickLarsen
+1  A: 

You have to make both SQL queries, and the COUNT is very fast with no WHERE clause. Cache the data where possible.

David Caunt
This is the only reliable answer when determining the number of potential rows for paging.
Chris Lively
the COUNT has the same WHERE clause as the LIMIT.
sopppas
A: 

If you really don't want to run the COUNT() query- and as others have stated, it's not something that slows things down appreciably- then you have to decide on your chunk size (ie the LIMIT number) up front. This will save you the COUNT() query, but you may end up with unfortunate pagination results (like 2 pages where the 2nd page has only 1 result).

So, a quick COUNT() and then a sensible LIMIT set-up, or no COUNT() and an arbitrary LIMIT that may increase the number of more expensive queries you have to do.

dnagirl
A: 

You could try selecting just one field (say, the IDs) and see if that helps, but I don't think it will - I imagine the biggest overhead is MySQL finding the correct rows in the first place.

If you simply want to count the total number of rows in the entire table (i.e. without a WHERE clause) then I believe SELECT COUNT(*) FROM table is fairly efficient.

Otherwise, the only solution if you need to have the total number visible is to select all the rows. However, you can cache this in another table. If you are selecting something from a category, say, store the category UID and the total rows selected. Then whenever you add/delete rows, count the totals again.

Another option - though it may sacrifice usability a little - is to only select the rows needed for the current page and next page. If there are some rows available for the next page, add a "Next" link. Do the same for the previous page. If you have 20 rows per page, you're selecting at most 60 rows on each page load, and you don't need to count all the rows available.

DisgruntledGoat
I'd like to show the user how many results (the number) there are, even tough just showing a sub-set of actual results.
sopppas
A: 

If you write your query to include one column that contains the count (in every row), and then the rest of the columns from your second query, you can:

  1. avoid the second database round-trip (which is probably more expensive than your query anyways)
  2. Increase the likelihood that MySQL's parser will generate an optimized execution plan that reuses the base query.
  3. Make the operation atomic.

Unfortunately, it also creates a little repetition, returning more data than you really need. But I would expect it to be much more efficient anyway. This is the sort of strategy used by a lot of ORM products when they eagerly load objects from connected tables with many-to-one or many-to-many relationships.

StriplingWarrior
A: 

As others have already pointed out, it's probably not worth much concern in this case -- as long as 'field' is indexed, both select's will be extremely fast.

If you have (for whatever reason) a situation where that's not enough, you could create a memory-based temporary table (i.e. a temporary table backed by the memory storage engine), and select your records into that temporary table. Then you could do selects from the temporary table and be quite well assured they'll be fast. This can use a lot of memory though (i.e. it forces that data to all stay in memory for the duration), so it's pretty unfriendly unless you're sure that:

  1. The amount of data is really small;
  2. You have so much memory it doesn't matter; or
  3. The machine will be nearly idle otherwise anyway.

The main time this comes in handy is if you have a really complex select that can't avoid scanning all of a large table (or more than one) but yields only a tiny amount of data.

Jerry Coffin