views:

639

answers:

2

I'm trying to match this character ’ which I can type with alt-0146. Word tells me its unicode 0x2019 but I can't seem to match it using regular expressions in coldfusion. Here's a snippet i'm using to match between 2 and 10 letters and apostrophes and this character

[[:alpha:]'\x2019]{2,10}

but it's not working. Any ideas?

+1  A: 

Another thing you could try is directly including the character:

[[:alpha:]'#Chr(8217)#]{2,10}


However I'm not sure if that will work with a CF regex. If not, you still have the option to use Java regex within CF. This is easy to do, and enables you to use a far wider range of regex functionality, almost certainly including unicode support.

If you're doing replacements, you can do a Java Regex directly on a CF string, for example:

<cfset NewString = OrigString.replaceAll( 'ajavaregex' , 'replacement' )/>


For other functionality (e.g. getting an array of matches, callback functions on replace), I have created Java RegEx Utilities - a single component that simplifies these functionality into a single function call.

Peter Boughton
Thanks. I'm doing matching/data validation and not really in a position to change the validation code to use you handy utility :/
Trigger
+5  A: 

It looks like the \x shorthand in CF only supports the first 255 ASCII characters. In order to go above that number, you need to use the chr command inline like this:

<cfscript>
   yourString = "’";
   result = refind("[[:alpha:]'" & chr(8217) & "]{2,10}", yourString);
   writeOutput(result);
</cfscript>

That should give you a match.

anopres
Spot on! Thank you.
Trigger