tags:

views:

106

answers:

5

Hi!

I need an regular expression that matches A.*C only if there's no "B" in between. Before and after "A.*C without B in between", any string is allowed (including A, B and C).
"A", "B" and "C" are placeholders for longer strings.
So the regex should match ie. "AfooC", "AfooCbarB", "A C B A", but not "AfooBbarC" or "A B C B".

I think I need a .* somewhere between A and B, so I tried (amongst others) these two:
A.*(?!B).*C doesn't work, as the .* after A "eats" the B.
A(?!.*B).*C doesn't work, as it doesn't match ACB. (This time, the first .* "eats" the "C").

Possibly I'm missing something obvious - I can't figure out how to do it.

Thanks for the help, Julian

(Edit: having some formatting troubles...)

+4  A: 

The easiest way to achieve this is using lookarounds:

A((?!B).)*C

This pattern will match A, then any number of characters, and then C. However, because of the negative lookahead on the ., the dot will only match if it isn't going to consume B.

Daniel Vandersluis
Genius. Thanks a lot.
jasamer
+1  A: 

How about A[^B]*C? [^B] is a character class matching "anything but the letter 'B'".

tadzik
That was my first instinct too, but `B` is a placeholder for something longer than one character, so a character class won't work here.
Daniel Vandersluis
@Daniel: Bah. I miss Perl 6's named regexes.
tadzik
A: 

"A[^B]*C". Matches an A, then any number of characters that isn't B, then C.

KeithS
"B" is a placeholder for a longer string. If "B" were, say, "XYZ", the regex would be "A[^(XYZ)]\*B" - but that wouldn't match "AXB", even though the regex I need should.
jasamer
A: 

Why not /A(?!.*B.*(C)).*?C/ ?

Colin Hebert
Why are you capturing the C in the lookahead?
Daniel Vandersluis
Also, the pattern fails to match on `A C B C`.
Daniel Vandersluis
Close... but it doesn't match "A C B C"... perhaps I still didn't describe precisely enough, that there can be any string after the "A C" without "B" in between...
jasamer
A: 

How about this:

\AA.+(?<!(.*cheesey.*))C\Z

The (?<!(.*cheesey.*)) does a negative lookbehind for the pattern .*cheesey.* and stops matching if it finds a match. The anchors are there to stop it from chopping off the end and matching in the middle even though 'cheesey' might be at the end.

Coding Gorilla