views:

244

answers:

3

I have a situation where I have an incoming data value that may or may not have leading zeroes. I need to match that to a field/row in a SQL Server table. The field value in the SQL Server database may or may not have leading zeroes as well.

So, I might have:

  • incoming = 5042800138
    and the value in db can be any of 5042800138, 05042800138, 005042800138, 0005042800138

  • or the incoming might be 005042800138
    and the value in db can be any of 5042800138, 05042800138, 005042800138, 0005042800138

The solution I came up with was to strip off the leading zeroes (always) on the incoming data and use SQL like the following example:

-- this simulates the incoming value to check
-- i strip out the leading zeroes.
declare @tryUPC as varchar(40)
set @tryUPC = '5042800138'

-- try to find it in the database and ignore leading zeroes
select prod_uid, prod_partno, prod_upc
from products as p
where (prod_upc = @tryUPC) or 
   (
   len(prod_upc) > len(@tryUPC)
   and right(prod_upc, len(@tryUPC)) = @tryUPC
   and stuff(prod_upc, 1, len(prod_upc) - len(@tryUPC), '0') = prod_upc
   )

This seems to work. My question is, am I missing something? Does SQL Server have a better way of dealing with this? I am using SQL Server 2005.

tia,

don

A: 

1) update all existing data to not have any leading zeros, possibly use an BIGINT datatype
2) always strip the leading zeros from the input before saving and searching
3) never worry about leading zeros again, and you can actually use an index!

EDIT after OP's Comment:

wouldn't it be nice, but its not reality. i suppose i should have mentioned this is a legacy app. the upc codes can be input in a bunch of different places. changing the data type would require massive refractoring. additionally, the zeroes ARE sometimes needed - there is a good reason for the database to be the way it is. – Don Dickinson

You could use a persistent computed column where you REVERSE() the column and then index it. Your can then query on:

WHERE Column1Reverse Like REVERSE('1234567')+'%' --can use the persistent computed column's index

to add a persisted computed column (that reverses the string) and index on it, use this code:

ALTER TABLE YourTable ADD ReversedYourString AS REVERSE(YourString) PERSISTED

CREATE NONCLUSTERED INDEX IX_YourTable_ReversedYourString 
ON YourTable (ReversedYourString) 
KM
+1 - beat me to it. Everything I was going to suggest was also to do with getting the data into the ideal/optimal format for querying
AdaTheDev
@AdaTheDev forgot one: ...getting the data into the ideal / optimal / **consistent** format...
KM
@KM - ah yes! Consistency is the stuff of champions :)
AdaTheDev
wouldn't it be nice, but its not reality. i suppose i should have mentioned this is a legacy app. the upc codes can be input in a bunch of different places. changing the data type would require massive refractoring. additionally, the zeroes ARE sometimes needed - there is a good reason for the database to be the way it is.
Don Dickinson
@Don Dickinson, thanks for putting that in the question!
KM
I like the (opposite) of your first notion: DO pad all the original data with the right number of leading zeros and store that data as *text*. Never introduce variable length for a code (this is a code, not a number) and the problem ALSO goes away, indexes work, etc. Same is true for zip codes, ssn's, serial numbers, and so on.
onupdatecascade
+1  A: 

If you aren't able to change existing data to strip the leading zeroes / convert to INT, it might be faster to just do something like so:

WHERE prod_upc IN (@tryUPC, '0' + @tryUPC, '00' + @tryUPC, '000' + @tryUPC [...])

That's about as elegant as my foot, but it would be more static & legible, and (probably) more likely to get at any relevant index.

That's assuming there's a finite limit on how many leading zeroes you have, mind. Converting the data to INT (or adding a new INT column and calculating it on insert) would probably be the best fix for the problem.

tadamson
i suppose that might be better. i'll have to look at the number of leading zeroes that are allowed. going out of my way to make it handle so many zeroes probably isn't necessary. thanks
Don Dickinson
As long as the data isn't segmented I'd definitely recommend this (segmented meaning <paddednumber>.<paddednumber> or <paddednumber><paddednumber>). Alternatively you can pad all to a specific number of zeros (123->00123, 0123->00123, 00123->00123, etc)... The key, as mentioned, is consistency between the two you are comparing.
KSimons
+2  A: 

Just another slant (correcting the data would be best, but the accepted answer is a decent workaround too): add a persisted, indexed computed column "actualUPC" that is a character type, computed with the correct number of leading zeros. Example:

If the "real" code is supposed to be 12 digits, make a computed column like

 right( '000000000000' + originalColumn, 12 )

That way the input data is actually corrected, then indexed properly and can be searched with the index.

When you query, also pad out the input to match, as a constant in the query.

Check the restrictions on indexed computed columns, though, before going too crazy.

BTW codes like this (postal codes, serial numbers, ssn's, etc.) should ALWAYS be stored as text data, with the leading zeros, and NEVER as an integer or numeric type. Take it from a guy who grew up in zip code 01033.

onupdatecascade