tags:

views:

1265

answers:

3

Is it possible to write a regex that returns the converse of a desired result? Regexes are usually inclusive - finding matches. I want to be able to transform a regex into its opposite - asserting that there are no matches. Is this possible? If so, how?

http://zijab.blogspot.com/2008/09/finding-opposite-of-regular-expression.html states that you should bracket your regex with

/^((?!^ MYREGEX ).)*$/

, but this doesn't seem to work. If I have regex

/[a|b]./

, the string "abc" returns false with both my regex and the converse suggested by zijab,

/^((?!^[a|b].).)*$/

. Is it possible to write a regex's converse, or am I thinking incorrectly?

+3  A: 

Um, couldn't you just check whether there are no matches or not? I don't know what language you are doing this in, but how about this pseudocode?

if (!'Some String'.match(someRegularExpression))
    // do something...

Just let me know if I'm not understanding you correctly...

EDIT: Then the regexp you got from your link should work:

/^((?!REGULAR_EXPRESSION_HERE).)*$/

Works fine for me.

musicfreak
Well, that's the problem. I'm not writing the regexes or the code. I have an app that lets people enter their own regexes, and they need to be either inclusive or exclusive. I either need to have them enter in another piece of data - inclusive/exclusive, or force them to write them all inclusive or exclusive, using an 'opposite' pattern, if one exists. This will only be used by programmers, so complexity is not a concern - just possibility.
Greg
Hmm, I see. Then why doesn't this work?/^((?!REGULAR_EXPRESSION_HERE).)*$/(Taken from your link.) It works for me.
musicfreak
+3  A: 

You can invert the character set by writing a ^ at the start ([^…]). So the opposite expression of [ab] (match either a or b) is [^ab] (match neither a nor b).

But the more complex your expression gets, the more complex is the complementary expression too. An example:

You want to match the literal foo. An expression, that does match anything else but a string that contains foo would have to match either

  1. any string that’s shorter than foo (^.{0,2}$), or
  2. any three characters long string that’s not foo (^([^f]..|f[^o].|fo[^o])$), or
  3. any longer string that does not contain foo.

All together this may work:

^[^fo]*(f+($|[^o]|o($|[^fo]*)))*$

But note: This does only apply to foo.

Gumbo
+1  A: 

The reason your inverted regex isn't working is because of the '^' inside the negative lookahead:

/^((?!^[ab].).)*$/
      ^            # WRONG

Maybe it's different in vim, but in every regex flavor I'm familiar with, the caret matches the beginning of the string (or the beginning of a line in multiline mode). But I think that was just a typo in the blog entry.

You also need to take into account the semantics of the regex tool you're using. For example, in Perl, this is true:

"abc" =~ /[ab]./

But in Java, this isn't:

"abc".matches("[ab].")

That's because the regex passed to the matches() method is implicitly anchored at both ends (i.e., /^[ab].$/).

Taking the more common, Perl semantics, /[ab]./ means the target string contains a sequence consisting of an 'a' or 'b' followed by at least one (non-line separator) character. In other words, at ANY point, the condition is TRUE. The inverse of that statement is, at EVERY point the condition is FALSE. That means, before you consume each character, you perform a negative lookahead to confirm that the character isn't the beginning of a matching sequence:

(?![ab].).

And you have to examine every character, so the regex has to be anchored at both ends:

/^(?:(?![ab].).)*$/

That's the general idea, but I don't think it's possible to invert every regex--not when the original regexes can include positive and negative lookarounds, reluctant and possessive quantifiers, and who-knows-what.

Alan Moore