views:

1792

answers:

5

Can regular expression be utilized to match any string except a specific string constant let us say "ABC" ? Is this possible to exclude just one specific string constant? Thanks your help in advance.

A: 

This isn't easy, unless your regexp engine has special support for it. The easiest way would be to use a negative-match option, for example:

$var !~ /^foo$/
    or die "too much foo";

If not, you have to do something evil:

$var =~ /^(($)|([^f].*)|(f[^o].*)|(fo[^o].*)|(foo.+))$/
    or die "too much foo";

That one basically says "if it starts with non-f, the rest can be anything; if it starts with f, non-o, the rest can be anything; otherwise, if it starts fo, the next character had better not be another o".

derobert
That won’t allow the empty string, `f`, `fo` and `foo`.
Gumbo
@Gumbo: It allows the empty string just fine; notice that ($) is the first alternative, so ^$ (empty string) is accepted. I tested it, and at least in perl 5.0.10 the empty string is accepted.
derobert
... sorry, perl 5.10.0, of course!
derobert
+5  A: 

You have to use a negative lookahead assertion.

(?!^ABC$)

You could for example use the following.

(?!^ABC$)(^.*$)
Daniel Brückner
This will work if you're looking for a string that does not include ABC. But is that the goal? Or is the goal to match every character except ABC?
Steve Wortham
Thanks for pointing that out, you are right - my suggestion only avoids strings starting with ABC - I forgot to anchor the assertion. Going to correct that.
Daniel Brückner
That's still different than what I was thinking. Perhaps the questioner will clarify what they're looking for.
Steve Wortham
I find it quite clear - "any string except a specific string [constant]" hence any string (including strings containing ABC) except ABC itself.
Daniel Brückner
Yeah, you may be right. If so then you're answer is perfect. You can see my answer to see how I interpreted it.
Steve Wortham
not working in Javascript. text = 'hi ABC wow';regex = /(?!^ABC$)/console.log(text.match(regex));
Nadal
You only used the assertion but forgot the matching expression - use (?!^ABC$)(^.*$) and it works.
Daniel Brückner
I was helping a friend recently to do something very similar. However, he didn't want to match the string if it contained a string anywhere inside of it. So I wrote a slightly modified version of your expression *(?!.*ABC)^.*$* and this works like a charm.
Steve Wortham
+1  A: 

You could use negative lookahead, or something like this:

^([^A]|A([^B]|B([^C]|$)|$)|$).*$

Maybe it could be simplified a bit.

Adam Crume
That won’t allow no string that starts with `ABC`.
Gumbo
A: 

In .NET you can use grouping to your advantage like this:

http://regexhero.net/tester/?id=65b32601-2326-4ece-912b-6dcefd883f31

You'll notice that:

(ABC)|(.)

Will grab everything except ABC in the 2nd group. Parenthesis surround each group. So (ABC) is group 1 and (.) is group 2.

So you just grab the 2nd group like this in a replace:

$2

Or in .NET look at the Groups collection inside the Regex class for a little more control.

You should be able to do something similar in most other regex implementations as well.

UPDATE: I found a much faster way to do this here: http://regexhero.net/tester/?id=997ce4a2-878c-41f2-9d28-34e0c5080e03

It still uses grouping (I can't find a way that doesn't use grouping). But this method is over 10X faster than the first.

Steve Wortham
A: 

Try this regular expression:

^(.{0,2}|([^A]..|A[^B].|AB[^C])|.{4,})$

It describes three cases:

  1. less than three arbitrary character
  2. exactly three characters, while either
    • the first is not A, or
    • the first is A but the second is not B, or
    • the first is A, the second B but the third is not C
  3. more than three arbitrary characters
Gumbo