I've got a modest table of about 10k rows that is often sorted by a column called 'name'. So, I added an index on this column. Now selects on it are fast:
EXPLAIN ANALYZE SELECT * FROM crm_venue ORDER BY name ASC LIMIT 10;
...query plan...
Limit (cost=0.00..1.22 rows=10 width=154) (actual time=0.029..0.065 rows=10 loops=1)
-> Index Scan using crm_venue_name on crm_venue (cost=0.00..1317.73 rows=10768 width=154) (actual time=0.026..0.050 rows=10 loops=1)
Total runtime: 0.130 ms
If I increase the LIMIT
to 60 (which is roughly what I use in the application) the total runtime doesn't increase much further.
Since I'm using a "logical delete pattern" on this table I only consider entries where the delete_date NULL
. So this is a common select I make:
SELECT * FROM crm_venue WHERE delete_date IS NULL ORDER BY name ASC LIMIT 10;
To make that query snappy as well I put an index on the name
column with a constraint like this:
CREATE INDEX name_delete_date_null ON crm_venue (name) WHERE delete_date IS NULL;
Now it's fast to do the ordering with the logical delete constraint:
EXPLAIN ANALYZE SELECT * FROM crm_venue WHERE delete_date IS NULL ORDER BY name ASC LIMIT 10;
Limit (cost=0.00..84.93 rows=10 width=154) (actual time=0.020..0.039 rows=10 loops=1)
-> Index Scan using name_delete_date_null on crm_venue (cost=0.00..458.62 rows=54 width=154) (actual time=0.018..0.033 rows=10 loops=1)
Total runtime: 0.076 ms
Awesome! But this is were I get myself into trouble. The application rarely calls for the first 10 rows. So, let's select some more rows:
EXPLAIN ANALYZE SELECT * FROM crm_venue WHERE delete_date IS NULL ORDER BY name ASC LIMIT 20;
Limit (cost=135.81..135.86 rows=20 width=154) (actual time=18.171..18.189 rows=20 loops=1)
-> Sort (cost=135.81..135.94 rows=54 width=154) (actual time=18.168..18.173 rows=20 loops=1)
Sort Key: name
Sort Method: top-N heapsort Memory: 21kB
-> Bitmap Heap Scan on crm_venue (cost=4.67..134.37 rows=54 width=154) (actual time=2.355..8.126 rows=10768 loops=1)
Recheck Cond: (delete_date IS NULL)
-> Bitmap Index Scan on crm_venue_delete_date_null_idx (cost=0.00..4.66 rows=54 width=0) (actual time=2.270..2.270 rows=10768 loops=1)
Index Cond: (delete_date IS NULL)
Total runtime: 18.278 ms
As you can see it goes from 0.1 ms to 18!!
Clearly what happens is that there's a point where the ordering can no longer use the index to run the sort. I noticed that as I increase the LIMIT
number from 20 to higher numbers it always takes around 20-25 ms.
Am I doing it wrong or is this a limitation of PostgreSQL? What is best way to set up indexes for this type of queries?