views:

125

answers:

4

I've read like 10 or so "tutorials", and they all involve the same thing:

  • Pull a count of the data set
  • Pull the relevant data set (LIMIT, OFFSET)

IE:

SELECT COUNT(*) 
  FROM table 
 WHERE something = ?

SELECT * 
  FROM table 
 WHERE something =? 
 LIMIT ? offset ?`

Two very similar queries, no? There has to be a better way to do this, my dataset is 600,000+ rows and already sluggish (results are determined by over 30 where clauses, and vary from user to user, but are properly indexed of course).

+1  A: 

Use the statistics for a count estimate. That will do for paginantion and won't give you much overhead.

See http://wiki.postgresql.org/wiki/Count_estimate

Frank Heikens
I've done more independent research, even the MySQL way runs a separate count. Turns out no matter we're screwed! (http://archives.postgresql.org/pgsql-performance/2006-12/msg00202.php) I'm going to cache the total # with like a 2 hour update.
Mohamed Ikal Al-Jabir
Or use an extra table to store the COUNT-result for a certain query. Triggers will do the job to keep everything in sync.
Frank Heikens
+2  A: 

Unfortunately, to get the exact count as it is at the moment of the query, postgresql has to go through all the rows that match the criteria and see if they are visible to your transaction. But you probably don't need the exact count, because results are stale anyway as soon as you send the results to the user. So there are things you may try:

  1. cache the count for subsequent queries so the cost is only for the first page (probably doesn't help much, most people only look at the first page anyway)
  2. if the queries map well, use an specialized inverted index search engine for the searches. Lucene/Solr is a good choice.
  3. If sometimes having the counts wildly off isn't a problem, use postgresql's built in statistics to estimate the amount of rows that might match. You can get at the numbers by doing an EXPLAIN on the query. Increase statistics target atleast for the relevant tables to get more accurate numbers. Numbers might still be significantly off with multiple predicates because the planner doesn't know the correlation between different predicates and assumes that they aren't correlated. So things like WHERE sex='male' AND has_breasts=true will assume that 25% will match, which is probably an order of magnitude off. If you run the explain with analyze, you can check how many rows the planner expected to have to go through to get the first page of results, how many it actually had to go through, and scale the estimate accordingly. This is probably somewhat similar to what google uses to estimate how many pages match your query. If I remember correctly Lucene should support similar estimation.
Ants Aasma
A: 

You may want to consider using a cursor.

Joshua D. Drake
A: 

You could CREATE TABLE AS and place all the results in a new table. You do have to manage the created tables though if TEMP tables are not an option.

Andrew E. Falcon