views:

163

answers:

1

In an Oracle 10 database, I did a small test:

SELECT REGEXP_REPLACE('İF', '[Iİ]F', 'DONE', 1, 0, 'i') FROM dual;

This doe not seem to match the regex. However, when I remove the last parameter (case insensitive regex parameter), regex matches

SELECT REGEXP_REPLACE('İF', '[Iİ]F', 'DONE', 1, 0) FROM dual;

Below queries also returns "DONE":

SELECT REGEXP_REPLACE('IF', '[Iİ]F', 'DONE', 1, 0) FROM dual; SELECT REGEXP_REPLACE('iF', '[Iİ]F', 'DONE', 1, 0, 'i') FROM dual;

Character set of the database is UTF8.

What should I do to make this regex work?

PS: In Turhish language, uppercase version of the character "i" is "İ".

+1  A: 

Hi Yusuf,

You could use the Equivalence Class [[=i=]]:

SQL> select regexp_replace('İF', '[[=i=]]f', 'DONE', 1, 0, 'i') from dual;

REGEXP_REPLACE('İF','[[=I=]]F'
------------------------------
DONE

SQL> select regexp_replace('if', '[[=i=]]f', 'DONE', 1, 0, 'i') from dual;

REGEXP_REPLACE('IF','[[=I=]]F'
------------------------------
DONE

Edit: There seems to be a bug in some versions of Oracle with the REGEXP search functions regarding ı->I and i->İ

One possible workaround is to use the NLS_UPPER or NLS_LOWER functions that work reliably with the correct NLS_SORT setting:

SQL> alter session set nls_sort=XTURKISH;

Session altered

SQL> select regexp_replace(NLS_UPPER('İF'), 'İF', 'DONE', 1, 0) from dual;

REGEXP_REPLACE(NLS_UPPER('İF'
------------------------------
DONE

SQL> select regexp_replace(NLS_UPPER('if'), 'İF', 'DONE', 1, 0) from dual;

REGEXP_REPLACE(NLS_UPPER('IF')
------------------------------
DONE
Vincent Malgrat