views:

317

answers:

6

I'm performing a regular expression match on a column of type "character varying(256)" in postresql 8.3.3. The column currently has no indices. I'd like to improve the performance of this query if I can.

Will adding an index help? Are there other things I can try to help improve performance?

+3  A: 

An index can't do anything with a regular expression. You're going to have to do a full table scan.

If at all possible, like if you're querying for the same regex all the time, you could add a column that specifies whether this row matches that regex and maintain that on inserts and updates.

Paul Tomblin
+1  A: 

Regex matches do not perform well on fairly big text columns. Try to accomplish this without the regex, or do the match in code if the dataset is not large.

Mike Miller
+1  A: 

This may be one of those times when you don't want to use RegEx. What does your reg-ex code look like? Maybe that's a way to speed it up.

Keng
+4  A: 

You cannot create an index that will speed up any generic regular expression; however, if you have one or a limited number of regular expressions that you are matching against, you have a few options.

As Paul Tomblin mentions, you can use an extra column or columns to indicate whether or not a given row matches that regex or regexes. That column can be indexed, and queried efficiently.

If you want to go further than that, this paper discusses an interesting sounding technique for indexing against regular expressions, which involves looking for long substrings in the regex and indexing based on whether those are present in the text to generate candidate matches. That filters down the number of rows that you actually need to check the regex against. You could probably implement this using GiST indexes, though that would be a non-trivial amount of work.

Brian Campbell
A: 

If you have a limited set of regexes to match against you could create a table with the primary key of your table and a field indicating if it matches that regex, which you would update on a trigger and then index your tables key in that table. This trades a small decrease in update and insert speed for a probably large speed increase in select.

Alternatively, you could write a function which compares your field to that regex (or even pass the regex along with the field you are matching to the function), then create a functional index on your table against that function. This also assumes a fixed set of regexes (but you can add new regex matches more easily this way).

If the regex is dynamically created from user input you might have to live with the table scan or change the user app to produce a more simple search like field like 'value%', which would use an index on field ('%value%' wouldn't).

MkV
A: 

If you do manage to reduce your needs to a simple LIKE query, look up indexes with text_pattern_ops to speed those up.