tags:

views:

816

answers:

2

I want to make an Oracle function to remove 'garbage' from user input values, but there's also a requirement that users may enter Unicode text which I'm supposed to leave as is.

REGEXP_REPLACE (search_text, '[^0-9A-Za-z]', '') takes care of non-Unicode, how can I check that varchar2 value contains Unicode characters?

Looks like I could compare results of LENGTH(search_text) and LENGTHB(search_text) to find whether characters take more than 1 byte. Is there a better way of doing it?

A: 

What do you mean by "Unicode characters?" The characters you list are "Unicode."

Do you mean "characters that are numeric or letters in any language?" Surely you don't mean ASCII because you've eliminate most of the ASCII characters too in your example.

Jason Cohen
Well, it is about cleaning up ASCII input and leaving Unicode input as is.
+1  A: 

Oracle has a function called asciistr, which takes any unicode characters that can't be represented in ascii and coverts it to the hex equivalent. So for instance:

asciistr('A B C Ä Ê')   would return 'A B C \00C4 \00CA'

You should then be able to write a regexp_replace algorithm to strip anything of the form '\XXXX'.

Nick
I don't need to stip Unicode. If the string contains Unicode I'm supposed to leave it as is.