ansaurus

Question

Regex for matching "everything but" a string

Answer 1

+2 A:

in perl:

if($str !~ /a.*?b.*?.*c/g)
{
    print "match";
}

should work.

Alan 2008-11-26 22:38:00

I see, so actively looking for strings that match what I'm trying to avoid,, then taking the opposite. I can't think of why that wouldn't work...

Ray 2008-11-26 22:41:39

well, you can theoretical do build a regex that matches the opposite. but that regex would be huge. The way you do that is to build a FSM, invert the end-conditions, then build a regex out of it

Johannes Schaub - litb 2008-11-26 22:43:43

@Ray: Yep you got it. In this case it's easier to regex what you don't want then find the strings that dont match the regex.

Alan 2008-11-26 22:51:22

Why the 's' and the 'g' ?

Federico Ramponi 2008-11-26 23:33:11

the s is for substitution, which is incorrect in this case. g is for global match

Alan 2008-11-26 23:37:53

Answer 2

A:

in Java:

(?m)^a?(.(?!a[^b\r\n]*b[^\r\nc]*c))+$

does match

abasfaf3
xxxabasfaf3

does not match

asasdfbascf
xxxxasasdfbascf

VonC 2008-11-26 22:40:57

Answer 3

+1 A:

In Python:

>>> r = re.compile("(?!^.*a.*b.*c.*$)")
>>> r.match("abc")
>>> r.match("xxabcxx")
>>> r.match("ab ")
<_sre.SRE_Match object at 0xb7bee288>
>>> r.match("abasfaf3")
<_sre.SRE_Match object at 0xb7bee288>
>>> r.match("asasdfbasc")
>>>

Federico Ramponi 2008-11-26 22:45:06

Why the downvote?

Federico Ramponi 2008-11-26 23:03:20

I didnt downvote you. but i asked the same today. why did they downvote my neat puzzle??

Johannes Schaub - litb 2008-11-26 23:22:06

Actually, in Perl it doesn't work. Sigh..

Federico Ramponi 2008-11-26 23:30:13

Answer 4

+1 A:

Well, you can theoretical build a regex that matches the opposite. But for longer strings, that regex would become big. The way you would do that systematically is (greatly simplified):

Convert the regular expression into a deterministic finite automaton
Convert the end conditions of the automaton, so that it accepts the inverted regular language
Convert the automaton back to a regular expression by successively removing nodes from the automaton, yet keeping the behavior of it the same. Removing one node will require putting two or more regular expressions together, so that they will account for the removed node.
If you happen to have one start node, and one end node, you are finished: The regular expression labeling the edge between them is your searched regular expression.

Practically, you can just match for the string you want not have in it, and invert the result. Here is what it would look like in awk:

echo azyxbc | awk '{ exit ($0 !~ /a.*b.*c/); }' && echo matched

If you are interested into this, i recommend the book "Introduction to the Theory of Computation" by Michael Sipser.

Johannes Schaub - litb 2008-11-26 22:47:45

ansaurus

tags:

views:

answers:

Regex for matching "everything but" a string

related questions