tags:

views:

3554

answers:

7

How do I write a regex to match any string that doesn't meet a particular pattern? I'm faced with a situation where I have to match an (A and ~B) pattern.

+7  A: 

You could use a look-ahead assertion:

(?!999)\d{3}

This example matches three digits other than 999.


But if you happen not to have a regular expression implementation with this feature (see Comparison of Regular Expression Flavors), you probably have to build a regular expression with the basic features on your own.

A compatible regular expression with basic syntax only would be:

[0-8]\d\d|\d[0-8]\d|\d\d[0-8]

This does also match any three digits sequence that is not 999.

Gumbo
Look-ahead is not standard regular expression syntax, it is a Perl extension, it will only work in Perl, PCRE (Perl-Compatible RegEx) or other non-standard implementations
Juliano
It may not be standard, but don't most modern languages support it? What language *doesn't* support look-aheads these days?
Bryan Oakley
That’s true. But most regex flavors support this feature (see <http://www.regular-expressions.info/refflavors.html>).
Gumbo
Turns out that the windows findstr function only supports pure DFA-style regex anyway, so I need to just do it all differently. You still get the answer, though.
notnot
A: 

[^XXX] where XXX is your pattern

see here

Andrew Bullock
Won't work, [] defines a character class, and matches a single character, not a subpattern.
Richard
You might try using ( ) then
Nerdling
Wouldn't work at all. [^XYZ] simply won't match the characters X, Y, or Z. Meaning not only will it not-match "XYZ", but also "ZXY", which is not a pattern matched by the regex XYZ. Meaning that this "solution" fails the basic requirements.
Devin Jeanpierre
he's right, my bad. I think thats just highlighted a bug in some regex i have!
Andrew Bullock
always providing
notnot
A: 
A[^B]

literally matches, A, and ~ B

so the following match

AC
AD
AF

and these doesn't

AB
Q

I think it works by making a character class of everything thats not B ([AC-Za-z0-9]) I believe that includes all off the asciibet.

Ape-inago
[^B] merely says "any *character* other than B". The original question was related to an expression. You can't just put an arbitrary expression inside [].
Bryan Oakley
" I'm faced with a situation where I have to match an (A and ~B) pattern."technically, A[^B] is that pattern...
Ape-inago
+4  A: 

Match against the pattern and use the host language to invert the boolean result of the match. This will be much more legible and maintainable.

Ben S
Then I just end up with (~A or B) instead of (A and ~B). It doesn't solve my problem.
notnot
Pseudo-code:String toTest;if (toTest.matches(A) AND !toTest.matches(B)) { ... }
Ben S
I should have been more clear - the pieces are not fully independent. If A matches part of the string, then we care if ~B matches the rest of it (but not necessarily the whole thing). This was for the windows command-line findstr function, which i found is restricted to true regexs, so moot point.
notnot
+2  A: 

This seems somewhat a basic question of Formal Languages or Theoretical Computer Science classes. I'm assuming that it is not homework based on your reputation and previous answers, so I'm answering this.

The complement of a regular language is also a regular language, but to construct it you have to build the DFA for the regular language, and make any valid state change into an error. See this for an example. What the page doesn't say is that it converted /(ac|bd)/ into /(a[^c]?|b[^d]?|[^ab])/. The conversion from a DFA back to a regular expression is not trivial. It is easier if you can use the regular expression unchanged and change the semantics in code, like suggested before.

Juliano
If I were dealing with actual regex's then this would all be moot. Regex now seems to refer to the nebulous CSG-ish (?) space of pattern matching that most langauges support. Since I need to match (A and ~B), there's no way to remove the negation and still do it all in one step.
notnot
Lookahead, as described above, would have done it if findstr did anything beyond true DFA regexs. The whole thing is sort of odd and I don't know why I have to do this command-line (batch now) style. It's just another example of my hands being tied.
notnot
@notnot: You are using findstr from Windows? Then you just need /v. Like:findstr A inputfile | findstr /v B > outputfile.txtThe first matches all lines with A, the second matches all lines that doesn't have B.
Juliano
Thanks! That's actually exactly what I needed. I didn't ask the question that way, though, so I still giving the answer to Gumbo for the more generalized answer.
notnot
A: 

pattern - re

str.split(/re/g)

will return everything except the pattern.

Test here

unigogo
A: 

(B)|(A)

then use what group 2 captures...

DW