views:

263

answers:

1

I have a requirement to find rows in a table containing 200,000 entries. Some may not consider this 'large', but it is large enough to warrant performance considerations.

The table contains strings consisting of digits only. For instance, the user can enter something like '12340-0560-78', or portions of this, e.g. '0560', and I need to match values

12345678 and 123405678 and 0123456780 etc

These are NDCs, national drug codes, and in spite of the standard, manufacturers format them in a variety of ways by adding or omitting zeros in various positions in their barcodes.

I started out with letting Linq do the work, by removing zeros and non-numeric characters from the search string and using Contains() on the column after removing all zeros from it. This is too slow.

So I added a computed column to the table, containing the search column minus all zeros. This is better, but I am still doing a table scan due to the Contains().

I then created a full text index, but realized that with a full text index I cannot search for substrings, only for words, phrases, and prefixes. Strange, but it does not solve this problem.

Are there other options?

+1  A: 

How about just creating a simple clustered index on the computed column. Is the performance ok then?

e.g.

CREATE TABLE [dbo].[foo](
    [code] [varchar](20) NULL,
    [ccol]  AS (replace(replace([code],'-',''),' ','')
) ON [PRIMARY]

and

CREATE CLUSTERED INDEX [IX_foo] ON [dbo].[foo] 
(
    [ccol] ASC
)
...
Sebastian Seifert
I forgot to mention - I did create an index on the computed column, otherwise it would not really provide any advantage. It works ok for now, and unless a better answer comes along, I'll accept yours.
cdonner