ansaurus

Question

Matching a field with variable number of leading zeroes in SQL Server table

Answer 1

A:

1) update all existing data to not have any leading zeros, possibly use an BIGINT datatype
2) always strip the leading zeros from the input before saving and searching
3) never worry about leading zeros again, and you can actually use an index!

EDIT after OP's Comment:

wouldn't it be nice, but its not reality. i suppose i should have mentioned this is a legacy app. the upc codes can be input in a bunch of different places. changing the data type would require massive refractoring. additionally, the zeroes ARE sometimes needed - there is a good reason for the database to be the way it is. – Don Dickinson

You could use a persistent computed column where you REVERSE() the column and then index it. Your can then query on:

WHERE Column1Reverse Like REVERSE('1234567')+'%' --can use the persistent computed column's index

to add a persisted computed column (that reverses the string) and index on it, use this code:

ALTER TABLE YourTable ADD ReversedYourString AS REVERSE(YourString) PERSISTED

CREATE NONCLUSTERED INDEX IX_YourTable_ReversedYourString 
ON YourTable (ReversedYourString)

KM 2010-02-05 18:34:06

+1 - beat me to it. Everything I was going to suggest was also to do with getting the data into the ideal/optimal format for querying

AdaTheDev 2010-02-05 18:37:10

@AdaTheDev forgot one: ...getting the data into the ideal / optimal / **consistent** format...

KM 2010-02-05 18:39:38

@KM - ah yes! Consistency is the stuff of champions :)

AdaTheDev 2010-02-05 18:48:52

wouldn't it be nice, but its not reality. i suppose i should have mentioned this is a legacy app. the upc codes can be input in a bunch of different places. changing the data type would require massive refractoring. additionally, the zeroes ARE sometimes needed - there is a good reason for the database to be the way it is.

Don Dickinson 2010-02-05 18:55:13

@Don Dickinson, thanks for putting that in the question!

KM 2010-02-05 19:06:44

I like the (opposite) of your first notion: DO pad all the original data with the right number of leading zeros and store that data as *text*. Never introduce variable length for a code (this is a code, not a number) and the problem ALSO goes away, indexes work, etc. Same is true for zip codes, ssn's, serial numbers, and so on.

onupdatecascade 2010-02-05 20:07:49

Answer 2

+1 A:

If you aren't able to change existing data to strip the leading zeroes / convert to INT, it might be faster to just do something like so:

WHERE prod_upc IN (@tryUPC, '0' + @tryUPC, '00' + @tryUPC, '000' + @tryUPC [...])

That's about as elegant as my foot, but it would be more static & legible, and (probably) more likely to get at any relevant index.

That's assuming there's a finite limit on how many leading zeroes you have, mind. Converting the data to INT (or adding a new INT column and calculating it on insert) would probably be the best fix for the problem.

tadamson 2010-02-05 19:04:35

i suppose that might be better. i'll have to look at the number of leading zeroes that are allowed. going out of my way to make it handle so many zeroes probably isn't necessary. thanks

Don Dickinson 2010-02-05 19:10:45

As long as the data isn't segmented I'd definitely recommend this (segmented meaning <paddednumber>.<paddednumber> or <paddednumber><paddednumber>). Alternatively you can pad all to a specific number of zeros (123->00123, 0123->00123, 00123->00123, etc)... The key, as mentioned, is consistency between the two you are comparing.

KSimons 2010-02-05 19:17:28

Answer 3

+2 A:

Just another slant (correcting the data would be best, but the accepted answer is a decent workaround too): add a persisted, indexed computed column "actualUPC" that is a character type, computed with the correct number of leading zeros. Example:

If the "real" code is supposed to be 12 digits, make a computed column like

 right( '000000000000' + originalColumn, 12 )

That way the input data is actually corrected, then indexed properly and can be searched with the index.

When you query, also pad out the input to match, as a constant in the query.

Check the restrictions on indexed computed columns, though, before going too crazy.

BTW codes like this (postal codes, serial numbers, ssn's, etc.) should ALWAYS be stored as text data, with the leading zeros, and NEVER as an integer or numeric type. Take it from a guy who grew up in zip code 01033.

onupdatecascade 2010-02-05 20:04:35

ansaurus

tags:

views:

answers:

Matching a field with variable number of leading zeroes in SQL Server table

related questions