tags:

views:

103

answers:

4

Long story short, I'm working with a library with a bug that causes a crash if I use a regex that has a caret after a bracket (for example, the regex [^a]). The bug is being worked on, and switching libraries is not an easy option, and I'd like to be able to continue work between now and when the bug is fixed.

Thus, I need to express the following two regexes without using the caret:

[^'] and [^"]

Can this be done? If so, how? It might be acceptable for now to just make a regex that contains all ascii characters, but I'm working with unicode, so that isn't a watertight workaround.

+5  A: 

Yes, try:

(?!['"]).

I'm assuming your regex library supports look aheads.

What it actually does is this:

(?!      # start negative look ahead
  ['"]   #   match a single- or double quote
)        # stop negative look ahead
.        # match any character other than line breaks

In plain English: "if a single or double quote cannot be 'seen' when looking ahead, match any character (other than line breaks)".

Bart Kiers
+1  A: 

What about substituting the ' char with something else (say, 0xdeadbeef or the like) and then re-substituting it back?

lorenzog
A: 

You'll need to tell us what kind of regexes the library supports. Depending with on the library, you might get away with something like [\x00-!#-&(-\U0010ffff]. It also also depends if the library you are using uses UTF-16 and surrogate pairs when matching the regexp, or if it correctly matches unicode characters outside BMP.

Nakedible
A: 

If you're just trying to match/remove/replace characters which aren't single or double quotes, you might find it easier and faster to simply iterate through the characters in the string and perform the necessary operation as you go.

Ultimately, the regular expression engine operates this way under the hood, so implementing simple replacements with a loop can sometimes be more efficient too.

Matt Ryall