tags:

views:

213

answers:

3

Hi!

I have two tables:

table1 (about 200000 records)
 number varchar(8) 

table2 (about 2000000 records)
 number varchar(8)

Fields 'number' in both tables have standard indexes. For each record in table1 there is about 10 records in table2 assigned.

I execute query:

explain select table1.number from table1, table2 where table1.number = table2.number;

Query plan shows that indexes won't be used, Seq Scans all over ;)

But if I reduce amount of records in table1 to ~2000 query plan starts showing that index will be used.

Maybe somebody can tell me why postgresql behaves in that way?

A: 

It could depend on the way your indexes were created. If "number" is actually a number, you should think about changing the column type to bigint. Again, not 100% sure, but I think indexing on character columns works different than on numeric-based fields...I could however be talking out of my butt.

gonzofish
Due to past design decisions it isn't always a number
Adrian Serafin
+1  A: 

Yes, the PostgreSQL docs can tell you!

Here are some highlights:

When indexes are not used, it can be useful for testing to force their use. There are run-time parameters that can turn off various plan types (see Section 18.6.1). For instance, turning off sequential scans (enable_seqscan) and nested-loop joins (enable_nestloop), which are the most basic plans, will force the system to use a different plan. If the system still chooses a sequential scan or nested-loop join then there is probably a more fundamental reason why the index is not being used; for example, the query condition does not match the index. (What kind of query can use what kind of index is explained in the previous sections.)

If forcing index usage does use the index, then there are two possibilities: Either the system is right and using the index is indeed not appropriate, or the cost estimates of the query plans are not reflecting reality. So you should time your query with and without indexes. The EXPLAIN ANALYZE command can be useful here.

Jonathan Feinberg
+3  A: 

Sequential scans are normal (and optimal) for queries with very low selectivity - that is, for queries that traverse whole tables.

When you deleted most rows from table1, it was no longer covering all possible distinct values from table2 - that's why index scan came to use.

For starters, I'd recommend trying this query:

select * from pg_stats where tablename in ('table1','table2');

That's the information that PostgreSQL uses to build a query plan.

The planner itself is quite complicated - consult the docs (mentioned by Jonathan) and sources [http://doxygen.postgresql.org/ -> src/backend/optimizer ] if you are so curious.

filiprem
+1 It's all about the cardinality of the values.
Trey