views:

395

answers:

1

Hi,

I would need to get a Regular Expression, which matches all Unicode control characters except for carriage return (0x0d), line feed (0x0a) and tabulator (0x09). Currently, my Regular Expression looks like this:

/\p{C}/u

I just need to define these three exceptions now.

+4  A: 

I think you can use a negative lookahead here, combined with character classes.

/(?![\x{000d}\x{000a}\x{0009}])\p{C}/u

What this does is use a negative lookahead to assert that the character is not one of those specified in the character class. Then it traverses the character again to match it with any control character.

I used the perl syntax for specifying single unicode points.

More discussion on lookarounds here

(Note that this has not been tested, but I think the concept is correct.)

Sean Nyman