views:

340

answers:

6

A few years ago I worked on a system where a numeric primary key was stored in a [SQL Server] varchar column, so I quickly came unstuck when querying with a BETWEEN operator:

SELECT ID FROM MyTable WHERE ID BETWEEN 100 AND 110;

Results:

100
102
103
109
110
11

This was simply bad design. However, I'm working on an 3rd-party ERP system, which as you can imagine needs to be generic and flexible; thus we have various tables where alphanumeric fields are provided where the business only uses numerics - so similar problems can occur.

I'm guessing that this is a common enough issue; I have a simple enough solution, but I'm curious as to how others approach such problems.

My simple solution is:

SELECT ID FROM MyTable 
WHERE ID BETWEEN iStartValue AND iEndValue 
AND (LENGTH(ID) = LENGTH(iStartValue)
 OR LENGTH(ID) = LENGTH(iEndValue));

As you can possibly tell, this is an Oracle system, but I'm usually working in SQL Server - so perhaps database-agnostic solutions are preferable.

Edit 1: Scratch that - I don't see why proprietary solutions aren't welcomed as well.

Edit 2: Thanks for all the responses. I'm not sure whether I'm disappointed there is not an obvious, sophisticated solution, but I'm correspondingly glad that it doesn't appear that I've missed anything obvious!

I think I still prefer my own solution; it's simple and it works - is there any reason why I shouldn't use it? I can't believe it is much, if any, less efficient that the other solutions offered.

I realise that in an ideal world, this problem wouldn't exist; but unfortunately, I don't work in an ideal world, and often it's a case of making the best of a bad situation.

+6  A: 

If you're sure that the values in ID are numeric only, why not just CAST them

WHERE CAST(ID as int) BETWEEN iStartValue AND iEndValue

EDIT 1: An extension to the casting method which should work is to use a sub-query to pull out all numeric records. Please note - I do not think this method is better than the one suggested above, I include it as it answers the problem!!!

SELECT ID 
FROM    (
    SELECT  ID
    FROM MyTable 
    WHERE ISNUMERIC(ID) = 1
    AND CHARINDEX ('.', ID) = 0
    AND CHARINDEX ('-', ID) = 0
    ) a
WHERE   CONVERT(bigint, ID)  BETWEEN 0 AND 12000
ORDER BY LENGTH(ID) ASC, ID

The check for "-" and "." characters isn't really required. I'm assuming your IDs can't be negative or decimal.

RB
In the first example, there was historic data that prevented this - the DB was used by more than one appication, and we didn't have the power to change it. In the second, many of the tables are generic/multi-use, so in some instance alphas were allowed/used in other parts of the application.
CJM
Casting in this way will prevent the use of indexes. Welcome to TableScan.
David B
[In case that wasn't clear - CASTing when there is alphanumeric data in the tables caused an error (SQL Error: ORA-01722: invalid number)]
CJM
I've updated it to show a method which should work. I'm not recommending it however!
RB
Re: David B's point -- you could use a function-based index in this case
Dave Costa
Re: EDIT 1 - thanks for the extra effort there.
CJM
+1  A: 

How about a cast instead.

SELECT ID FROM MyTable 
WHERE cast(ID as signed) BETWEEN cast(iStartValue as signed) AND cast(iEndValue as signed)

This syntax given is MySQL but there are similar CAST operators for T-SQl.

Vincent Ramdhanie
Thanks Vincent. Yes, I considered this, but it wasn't appropriate (see comments to RB)
CJM
+2  A: 

I don't know if this might work in your situation, but ...

How about adding an actual numeric column to the table, populated with the value, (SQL Server you could use a calculated column with a persisted index established on it)

In other vendors DBs use some other mechanism to populate (trigger, materialized view, etc)

and then use that column instead of the varchar one...

Charles Bretana
That's what I was going to suggest.
P Daddy
It's a good idea, but in both cases I can't alter the schema. In the second example, I'm building bolt-on functionality to an ERP system - so I have to owrk within the boundaries set by the ERP developers unfortunately.
CJM
Can you add a view? You could put your calculated column (and a clustered index) there.
P Daddy
+1  A: 

Perhaps LPAD(id, 12,' ') would work for you. It should make all column values 12 wide, with spaces padded to the left.

Also I would be a bit concerned about numerics in varchar2 columns.

if you do any thing numerical, like analytics, you might get an exception on non numeric data.

EvilTeach
+1  A: 

Another option would be to left pad your numbers with zero's and use the between operator on that. For sargability reasons, it's probably better to include this as a second where condition (so that possible indexes can still be used). Something like this...

SELECT ID FROM MyTable 
WHERE  ID BETWEEN iStartValue AND iEndValue 
       And Right('0000000000' + ID, 10) Between iStartValue and iEndValue

I tested this in SQL Server and it returns the right values. You may need to modify this to work with Oracle.

G Mastros
Nice idea. [Credit also to EvilTeach who suggested similar].
CJM
A: 

In the end I've stuck with my own solution (see OP). Thanks for your efforts anyway.

CJM