views:

265

answers:

6

I am attempting to use a LIKE clause in a SQL statement to match a certain pattern within Oracle.

I would like to do something as such:

LIKE '[A-Z][A-Z][1-4]%'

..but I can't use a regex because this is on Oracle9i (regex support came in 10g).

I am attempting to match something that has two characters before it, then a number between 1 and 4 and that whatever beyond that. I have attempted this, but it doesn't seem to work. The only way I have been able to get it to work is by doing:

WHERE ...
LIKE '%1__' OR 
LIKE '%2__' OR 
LIKE '%3__' OR 
LIKE '%4__'

I am not sure if the way I would like to do it is possible or the correct way as I have never attempted patterns with the LIKE clause.

Any help you could give would be greatly appreciated.

A: 

I think you want to use REGEXP_LIKE instead of like.

WHERE REGEXP_LIKE(fieldName, '[A-Z]{2}[1-4]?.+','i');

Chad
And how would you do it on pre-10g, when regexes weren't supported?
OMG Ponies
Yeah, I would easily use REGEXP_LIKE, but as rexem mentioned, no support pre-10g.
statikfx
Roll your own REGEXP_LIKE, I suppose...
DCookie
Apparently I can't read...I didn't see the Oracle version in the post when I first read it. Sorry.
Chad
@Chad: The version wasn't present when you answered. Just have to be careful about answers when pertinent information isn't immediately available.
OMG Ponies
@Chad: As rexem said, you didn't miss it, it was my fault for not including it. My apologies.
statikfx
+6  A: 

Clunky, but perhaps:

select *
  from <your_table>
 where TRANSLATE(SUBSTR(<blah>,1,3),'ABCDEFGHIJKLMNOPQRSTUVWXYZ1234',
                                    'AAAAAAAAAAAAAAAAAAAAAAAAAA1111') = 'AA1';

might suit your needs....

EDIT: Incorporated xlnt suggestion by @Hobo to translate the substring, rather than taking the substring of the translated string...

DCookie
I was thinking along similar lines but using REPLACE...
OMG Ponies
+1. Though if <blah> is long, would you be better off doing translate(substr(<blah>,1,3), 'ABC...', 'AAA...')? I'm assuming translate is cheaper on a shorter string, and the time to get the substring would be the same in both cases.
Hobo
Good point...........
DCookie
+1  A: 

I recommend using INSTR:

INSTR(t.column, '1', 3, 1) > 0 OR 
INSTR(t.column, '2', 3, 1) > 0 OR
INSTR(t.column, '3', 3, 1) > 0 OR
INSTR(t.column, '4', 3, 1) > 0

The problem with using a wildcard in a LIKE is there's no control over where in column value the 1/2/3/4/etc is going to turn up - it could be at the end.

DCookie is right - this answer doesn't have a way of handling if there's numeric data in that place. But it's still better than using LIKE with wildcards.

OMG Ponies
What about the leading [A-Z][A-Z]?
DCookie
+1  A: 

Total wild card but will suggest it anyway.

Does your 9i Database installation have the PL/SQL Web Toolkit installed? The reason for asking is that one of our customers pointed out that there is limited regular expression support using one of the supplied packages that comes with it.

The package is called owa_pattern and the only 9i link I could find is this one

I have never used it and am still trying to get to grips with Regular Expressions so can't tell you if it would fit your purpose but thought that it may be of use.

carpenteri
+3  A: 

Try this:

SELECT c1
FROM   t1
WHERE  substr(c1,1,1) IN ('A','B','C','D',
                          'E','F','G','H',
                          'I','J','K','L',
                          'M','N','O','P',
                          'Q','R','S','T',
                          'U','V','W','X',
                          'Y','Z')
       AND substr(c1,2,1) IN ('A','B','C','D',
                              'E','F','G','H',
                              'I','J','K','L',
                              'M','N','O','P',
                              'Q','R','S','T',
                              'U','V','W','X',
                              'Y','Z')
       AND substr(c1,3,1) IN ('1','2','3','4')
/

If you want also to match lower letters, then apply the upper() function to the 2 first substr(): eg. where upper(substr(c1,1,1)) in ...

Performance

I tested the Performance of the query with regexp_like and without. As you can see the query without the regexp_like function was 100% faster. (Note. Both querys did a soft-parse)

SQL> select count(*) from t1;                                                                                                                     

  COUNT(*)
----------
    458752

Elapsed: 00:00:00.02
SQL> set timing off;                                                                                                                                  
SQL> select count(*) from t1;                                                                                                                     

  COUNT(*)
----------
    458752

SQL> set timing on;                                                                                                                                   
SQL> select count(*) from t1 where regexp_like(c1, '[A-Z][A-Z][1-4].*');                                                                       

  COUNT(*)
----------
     65536

Elapsed: 00:00:02.66
SELECT count(*)
FROM   t1
WHERE  substr(c1,1,1) IN ('A','B','C','D',
                          'E','F','G','H',
                          'I','J','K','L',
                          'M','N','O','P',
                          'Q','R','S','T',
                          'U','V','W','X',
                          'Y','Z')
       AND substr(c1,2,1) IN ('A','B','C','D',
                              'E','F','G','H',
                              'I','J','K','L',
                              'M','N','O','P',
                              'Q','R','S','T',
                              'U','V','W','X',
                              'Y','Z')
       AND substr(c1,3,1) IN ('1','2','3','4')
 18  /                                                                                                                                                

  COUNT(*)
----------
     65536

Elapsed: 00:00:01.15
SQL>

Second Method

Get the ascii values of A,Z,1 and 4

SQL> select ascii('A') from dual;                                                                                                                     

ASCII('A')
----------
        65


SQL> select ascii('Z') from dual;                                                                                                                     

ASCII('Z')
----------
        90


SQL> select ascii('1') from dual;                                                                                                                     

ASCII('1')
----------
        49

SQL> select ascii('4') from dual;                                                                                                                     

ASCII('4')
----------
        52

Now you can write your statement a lot shorter

SELECT count(* )
FROM   t1
WHERE  ascii(substr(c1,1,1)) BETWEEN 65 AND 90
       AND ascii(substr(c1,2,1)) BETWEEN 65 AND 90
       AND ascii(substr(c1,3,1)) BETWEEN 49 AND 52
/
zürigschnäzlets
If I wanted to match a third character before the number that might or might not be there, what could I do to the second method (ASCII) to allow for this?
statikfx
Just do WHERE substr(c1,1,1) between 'A' and 'Z' ... instead of where ascii(substr(c1,1,1)) BETWEEN 65 AND 90 .
tuinstoel
I hope that wasn't the extent of your comparison testing. Some of the savings could come from the fact your rows were in memory from the first query. Also, does the ASCII trick work with multibyte character sets? I don't know, just asking.
DCookie
+1  A: 

If you really want to use reg exps you can develop a java stored proc and a accompanying pl/sql wrapper. (I assume that the Java release supported in Oracle 9 supports reg exps, I am not 100% sure). You can call that java stored proc via the pl/sql wrapper in your select statement.

But more easy and faster:

SELECT c1
FROM   t1
WHERE  substr(c1,1,1) between  'A' and 'Z'
AND    substr(c1,2,1) between  'A' and 'Z'
AND    substr(c1,3,1) IN ('1','2','3','4')

A variant of zürigschnäzlets 's solution without use of the ascci function.

tuinstoel
+1 for the java suggestion and the cleaner example.
DCookie