views:

304

answers:

1

I ran into a weird behavior from Oracle this morning... And I can't understand why it acts this way from the docs. I'm sorry for the long post, but I want to make sure I'm understood. Oh, and make sure to read the note at the end before answering :)

The goal of the request is to return rows with 1 or more lowercase characters. For the sake of the example, my table will be:

CREATE TABLE "TEMP_TABLE" 
   ( "VAL" VARCHAR2(4000 BYTE) );
Insert into TEMP_TABLE (VAL) values ('00A00');
Insert into TEMP_TABLE (VAL) values ('00000');
Insert into TEMP_TABLE (VAL) values ('BC000');
Insert into TEMP_TABLE (VAL) values ('ABC00');
Insert into TEMP_TABLE (VAL) values ('AAAAA');
Insert into TEMP_TABLE (VAL) values ('abc00');

Using this SQL Request:

select val, 
case when regexp_like (val, '[a-b]') then 'MATCH' else 'NO' end from temp_table;

If the NLS_SORT value of the session is set to BINARY, oracle returns:

00A00   NO
00000   NO
BC000   NO
ABC00   NO
AAAAA   NO
abc00   MATCH

=>All good here: the only word containing a lowercase letter matches; the others don't.

but if NLS_SORT is set to FRENCH, the results are less understandable:

00A00   NO
00000   NO
BC000   MATCH
ABC00   MATCH
AAAAA   NO
abc00   MATCH

From what I can deduce, the regexp matches when there are characters other than A.

So my question is: Why would Oracle understand [a-z] as 'rows with letters that are not A' ?

Note 1: Specs: The database is under Oracle 10G(r2), and the Session's NLS Parameters are as follow:

NLS_CALENDAR    GREGORIAN
NLS_COMP        BINARY
NLS_CURRENCY    ¿
NLS_DATE_FORMAT DD/MM/RR HH24:MI
NLS_DATE_LANGUAGE   FRENCH
NLS_DUAL_CURRENCY   ¿
NLS_ISO_CURRENCY    FRANCE
NLS_LANGUAGE    FRENCH
NLS_LENGTH_SEMANTICS    BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NUMERIC_CHARACTERS  , 
NLS_SORT    FRENCH_M
NLS_TERRITORY   FRANCE
NLS_TIME_FORMAT HH24:MI:SSXFF
NLS_TIMESTAMP_FORMAT    DD/MM/RR HH24:MI:SSXFF
NLS_TIMESTAMP_TZ_FORMAT DD/MM/RR HH24:MI:SSXFF TZR
NLS_TIME_TZ_FORMAT  HH24:MI:SSXFF TZR

Note 2 : Yes, I could use regexp_like(val, '[[:lower:]]') . But I found out about this later on, and it doesn't explain the weird behaviour.

+2  A: 

For better or worse, the sort ordering defined by nls_sort is being used to evaluate the [a-z] regexp. If you insert a,b,c,A,B, and C into temp_table and sort it under each setting you'll get the following:

SQL> alter session set nls_sort=BINARY;

Session altered.

SQL> select val,
  2  case when regexp_like (val, '[a-z]') then 'MATCH' else 'NO' end m
  3  from temp_table order by val;

VAL     M
------------------------- -------------------------
A       NO
B       NO
C       NO
a       MATCH
b       MATCH
c       MATCH

6 rows selected.

SQL> alter session set nls_sort=FRENCH;

Session altered.

SQL> select val,
  2  case when regexp_like (val, '[a-z]') then 'MATCH' else 'NO' end m
  3  from temp_table order by val;

VAL     M
------------------------- -------------------------
A       NO
a       MATCH
B       MATCH
b       MATCH
C       MATCH
c       MATCH

6 rows selected.

Since the upper case letters are "interleaved" with the lower case letters in the French setting it evaluates to true in Oracle's implementation.

dpbradley
Makes total sense, thank you!
altermativ