views:

1687

answers:

3

In some regex flavors, [negative] zero-width assertions (look-ahead/look-behind) are not supported.

This makes it extremely difficult (impossible?) to state an exclusion. For example "every line that does not have "foo" on it", like this:

^((?!foo).)*$

Can the same thing be achieved without using look-around at all (complexity and performance concerns set aside for the moment)?

A: 

You can usually look for foo and invert the result of the regex match from the client code.

For a simple example, let's say you want to validate that a string contains only certain characters.

You could write that like this:

^[A-Za-z0-9.$-]*$

and accept a true result as valid, or like this:

[^A-Za-z0-9.$-]

and accept a false result as valid.

Of course, this isn't always an option: sometimes you just have to put the expression in a config file or pass it to another program, for example. But it's worth remembering. Your specific problem, for example, the expression is much simpler if you can use negation like this.

Joel Coehoorn
I know that post-processing would solve the problem... That's what I was trying to avoid, I was looking for a vanilla regex that does the right thing. Additionally, I am looking for something that disallows a specific sequence of characters, not an unordered set.
Tomalak
+11  A: 
^(f(o[^o]|[^o])|[^f])*$

NOTE: It is much much easier just to negate a match on the client side instead of using the above regex.

The regex assumes that each line ends with a newline char if it is not then see C++'s and grep's regexs.

Sample programs in Perl, Python, C++, and grep all give the same output.

  • perl

    #!/usr/bin/perl -wn
    print if /^(f(o[^o]|[^o])|[^f])*$/;
    
  • python

    #!/usr/bin/env python
    import fileinput, re, sys
    from itertools import ifilter
    
    
    re_not_foo = re.compile(r"^(f(o[^o]|[^o])|[^f])*$")
    for line in ifilter(re_not_foo.match, fileinput.input()):
        sys.stdout.write(line)
    
  • c++

    #include <iostream>
    #include <string>
    #include <boost/regex.hpp>
    
    
    int main()
    {
      boost::regex re("^(f(o([^o]|$)|([^o]|$))|[^f])*$");
      //NOTE: "|$"s are there due to `getline()` strips newline char
    
    
      std::string line;
      while (std::getline(std::cin, line)) 
        if (boost::regex_match(line, re))
          std::cout << line << std::endl;
    }
    
  • grep

    $ grep "^\(f\(o\([^o]\|$\)\|\([^o]\|$\)\)\|[^f]\)*$" in.txt
    

Sample file:

    foo
    'foo'
    abdfoode
    abdfode
    abdfde
    abcde
    f

    fo
    foo
    fooo
    ofooa
    ofo
    ofoo

Output:

    abdfode
    abdfde
    abcde
    f

    fo
    ofo
J.F. Sebastian
That's it. This is so simple that I'm ashamed not having thought of it myself. Thank you very much.
Tomalak
Nice. Really, really nice.
Gumbo
It's clear that post processing in the program, negating the match is the preferred method. Sometimes you don't have a choice, and even if you have, it is good to know your alternatives.
Tomalak
This regular expression is not correct. It does not match `f`, `fo` or `barf`. But this one does: `^(f(o([^o]|$)|[^o]|$)|[^f])*$`
Gumbo
@Gumbo: See regexps used for C++ and grep (hint: they are the same).
J.F. Sebastian
@J.F. Sebastian: Ah, you’re right. I wonder why he didn’t change the other ones.
Gumbo
@Gumbo: Each program works as is there is no need to change them.
J.F. Sebastian
A: 

I stumbled across this question looking for my own regex exclusion solution, where I am trying to exclude a sequence within my regex.

My initial reaction to this situation: For example "every line that does not have "foo" on it" was simply to use the -v invert sense of matching option in grep.

grep -v foo

this returns all lines in a file that don't match 'foo'

It's so simple I have the strong feeling I've just misread your question....

`grep -v foo` searches for "foo" and negates the result, and the OP said he wanted the regex itself to do the work. But suppose the requirement were "contains 'foo' and *doesn't* contain 'bar'", and you could only perform one regex match? Simply negating the result wouldn't be an option then.
Alan Moore