tags:

views:

2967

answers:

4

I'm looking for a regular expression that will match all strings EXCEPT those that contain a certain string within. Can someone help me construct it?

For example, looking for all strings that do not have a, b, and c in them in that order.

So
abasfaf3 would match, whereas
asasdfbasc would not

+2  A: 

in perl:

if($str !~ /a.*?b.*?.*c/g)
{
    print "match";
}

should work.

Alan
I see, so actively looking for strings that match what I'm trying to avoid,, then taking the opposite. I can't think of why that wouldn't work...
Ray
well, you can theoretical do build a regex that matches the opposite. but that regex would be huge. The way you do that is to build a FSM, invert the end-conditions, then build a regex out of it
Johannes Schaub - litb
@Ray: Yep you got it. In this case it's easier to regex what you don't want then find the strings that dont match the regex.
Alan
Why the 's' and the 'g' ?
Federico Ramponi
the s is for substitution, which is incorrect in this case. g is for global match
Alan
A: 

in Java:

(?m)^a?(.(?!a[^b\r\n]*b[^\r\nc]*c))+$

does match

abasfaf3
xxxabasfaf3

does not match

asasdfbascf
xxxxasasdfbascf
VonC
+1  A: 

In Python:

>>> r = re.compile("(?!^.*a.*b.*c.*$)")
>>> r.match("abc")
>>> r.match("xxabcxx")
>>> r.match("ab ")
<_sre.SRE_Match object at 0xb7bee288>
>>> r.match("abasfaf3")
<_sre.SRE_Match object at 0xb7bee288>
>>> r.match("asasdfbasc")
>>>
Federico Ramponi
Why the downvote?
Federico Ramponi
I didnt downvote you. but i asked the same today. why did they downvote my neat puzzle??
Johannes Schaub - litb
Actually, in Perl it doesn't work. Sigh..
Federico Ramponi
+1  A: 

Well, you can theoretical build a regex that matches the opposite. But for longer strings, that regex would become big. The way you would do that systematically is (greatly simplified):

  • Convert the regular expression into a deterministic finite automaton
  • Convert the end conditions of the automaton, so that it accepts the inverted regular language
  • Convert the automaton back to a regular expression by successively removing nodes from the automaton, yet keeping the behavior of it the same. Removing one node will require putting two or more regular expressions together, so that they will account for the removed node.
  • If you happen to have one start node, and one end node, you are finished: The regular expression labeling the edge between them is your searched regular expression.

Practically, you can just match for the string you want not have in it, and invert the result. Here is what it would look like in awk:

echo azyxbc | awk '{ exit ($0 !~ /a.*b.*c/); }' && echo matched

If you are interested into this, i recommend the book "Introduction to the Theory of Computation" by Michael Sipser.

Johannes Schaub - litb