ansaurus

Question

Answer 1

+4 A:

The database is able use the index for this query, but it chooses not to because it would be slower. Update: This is not quite right: you have to rewrite the query slightly. See Araqnid's answer.

Your where clause selects almost all rows in your table (rows = 999999). The database can see that a table scan would be faster in this case and therefore ignores the index. It is faster because the column person is not in your index so it would have to make two lookups for each row, once in the index to check the WHERE clause, and then again in the main table to fetch the column person.

If you had a different type of data where there were most values were foo and just a few were bar and you said WHERE col <> 'foo' then it probably would use the index.

Any way to make these "is not equal to" searches faster?

Any query that selects almost 1 million rows is going to be slow. Try adding a limit clause.

Mark Byers 2010-05-19 09:34:53

OK, I keep forgetting that the DBMS is cleverer than me and sometimes deliberately decides to NOT use indexes. It even depend on the actual values in the query.Howevere, I was not yet able to have a NOT query which uses the index, even with specially populated databases. Even when only 80 rows are selected, PostgreSQL uses a Seq Scan.

bortzmeyer 2010-05-19 10:31:35

In the end, I used araqnid's solution (rewriting the "<>" in "< OR >") and accepted his solution. Thanks.

bortzmeyer 2010-05-20 14:07:51

@bortzmeyer: OK, thanks for letting me know.

Mark Byers 2010-05-20 15:54:46

Answer 2

+2 A:

Possibly it would help to write:

SELECT person FROM PhonesPersons WHERE phone < '+33 1234567'
UNION ALL
SELECT person FROM PhonesPersons WHERE phone > '+33 1234567'

or simply

SELECT person FROM PhonesPersons WHERE phone > '+33 1234567'
                                       OR phone < '+33 1234567'

PostgreSQL should be able to determine that the selectivity of the range operation is very high and to consider using an index for it.

I don't think it can use an index directly to satisfy a not-equals predicate, although it would be nice if it could try re-writing the not-equals as above (if it helps) during planning. If it works, suggest it to the developers ;)

Rationale: searching an index for all values not equal to a certain one requires scanning the full index. By contrast, searching for all elements less than a certain key means finding the greatest non-matching item in the tree and scanning backwards. Similarly, searching for all elements greater than a certain key in the opposite direction. These operations are easy to fulfill using b-tree structures. Also, the statistics that PostgreSQL collects should be able to point out that "+33 1234567" is a known frequent value: by removing the frequency of those and nulls from 1, we have the proportion of rows left to select: the histogram bounds will indicate whether those are skewed to one side or not. But if the exclusion of nulls and that frequent value pushes the proportion of rows remaining low enough (Istr about 20%), an index scan should be appropriate. Check the stats for the column in pg_stats to see what proportion it's actually calculated.

Update: I tried this on a local table with a vaguely similar distribution, and both forms of the above produced something other than a plain seq scan. The latter (using "OR") was a bitmap scan that may actually devolve to just being a seq scan if the bias towards your common value is particularly extreme... although the planner can see that, I don't think it will automatically rewrite to an "Append(Index Scan,Index Scan)" internally. Turning "enable_bitmapscan" off just made it revert to a seq scan.

PS: indexing a text column and using the inequality operators can be an issue, if your database location is not C. You may need to add an extra index that uses text_pattern_ops or varchar_pattern_ops; this is similar to the problem of indexing for column LIKE 'prefix%' predicates.

Alternative: you could create a partial index:

CREATE INDEX PhonesPersonsOthers ON PhonesPersons(phone) WHERE phone <> '+33 1234567'

this will make the <>-using select statement just scan through that partial index: since it excludes most of the entries in the table, it should be small.

araqnid 2010-05-19 13:56:47

I just tested your idea of rewriting "<>" to "< OR >" and it works. EXPLAIN shows that the index is used and performance improves a lot. I make more tests and I'll accept your answer. Question: why PostgreSQL cannot do this rewriting itself?

bortzmeyer 2010-05-20 08:49:32

@bortzmeyer Possibly because the operator system is so general- it would need some way of relating the "="/"<>" pair of operators to "<" and ">". It may be worth suggesting to the postgresql list as a feature.

araqnid 2010-05-20 10:07:11

OK, it works fine, thanks. A small warning: not all PostgreSQL indexes have ordering http://www.postgresql.org/docs/current/interactive/indexes-types.html

bortzmeyer 2010-05-20 14:09:26

@bortzmeyer Just realised you can use partial indices for this too, see update.

araqnid 2010-05-28 10:49:00

ansaurus

tags:

views:

answers:

SQL indexes for "not equal" searches

related questions