views:

282

answers:

4

Which regular expression can I use to find all strings bar are not preceded by string foo? Having whitespace between the two is also illegal.

So the regex should match the following strings

foo is bar
hello bar

But not these

foobar
foo     bar

I've tried using the following

(?!<foo)bar

and it gets the work done for eliminating foobar, but I need to take care of the whitespace, and of course

(?!<foo)\s*bar

matches all the strings.

Thanks!

A: 
  (?!<foo)\s*bar

This will match the whitespace

Hogan
Uh no. First, it's `(?<!..)` and second, the `\s*` needs to be inside the lookbehind or it will always match unless there is no whitespace between `foo` and `bar`. Mark Byers' got it right.
Tim Pietzcker
sure sure all I knows is JA edited my answer, I feel blessed.
Hogan
A: 

php:

!preg_match(/foo\s*bar/,$string) && preg_match(/bar/,$string)

perl:

$string !~ /foo\s*bar/ && $string =~ /bar/
Jeff B
As mentioned in the original question, this does not work.
bosh
Ah, yes, because all of the strings technically can be found to have non-foo strings before bar...
Jeff B
What you really need is to just do a negative regex. $string !~ /foo\s*bar/. Updated with php, and perl versions.
Jeff B
Now it reports success even if the string doesn't contain bar.
Mark Byers
...in addition to the search for bar. Added in answer.
Jeff B
+1. The perl version did the job fine :)
Mike
+1  A: 

Given a few test cases

my @match = (
  "foo is bar",
  "hello bar",
);

my @reject = (
  "foobar",
  "foo     bar",
);

you could of course do by feeding the results of one pattern to another:

my @control = grep !/foo\s*bar/, grep /bar/ => @match, @reject;

We can also do it with one:

my $nofoo = qr/
  (      [^f] |
    f  (?! o) |
    fo (?! o  \s* bar)
  )*
/x;

my $pattern = qr/^ $nofoo bar /x;

But don't take my word for it.

for (@match) {
  print +(/$pattern/ ? "PASS" : "FAIL"), ": $_\n";
}

for (@reject) {
  print +(/$pattern/ ? "FAIL" : "PASS"), ": $_\n";
}
Greg Bacon
Impressive that you got this to work. Most likely "foo" and "bar" are just placeholders for much longer strings. It looks like your regular expressions are going to get extremely long for any real world examples. +1 for the different approach though.
Mark Byers
Thanks, and the sad news is that a literal pattern is the best case. I wonder what the limit of this approach is. It'd be nice for such tasks to have a regular-expression switch that complements the accept status of each NFA state.
Greg Bacon
+3  A: 

Better to use other facilities of the programming language than to look too hard for a regex pattern.

You are looking for strings for which $s =~ /bar/ and not $s =~ /foo\s*bar/ is true.

The rest of the script below is just for testing.

#!/usr/bin/perl

use strict; use warnings;

my %strings = (
    'foo is bar'  => 1,
    'hello bar'   => 1,
    'foobar'      => 0,
    'foo     bar' => 0,
    'barbar'      => 1,
    'bar foo'     => 1,
    'foo foo'     => 0,
);

my @accept = grep { $strings{$_} } keys %strings;
my @reject = grep { not $strings{$_} } keys %strings;

for my $s ( @accept ) {
    if ( $s =~ /bar/ and not $s =~ /foo\s*bar/ ) {
        print "Good: $s\n";
    }
    else {
        print "Bad : $s\n";
    }
}

for my $s ( @reject ) {
    if ( $s =~ /bar/ and not $s =~ /foo\s*bar/ ) {
        print "Bad : $s\n";
    }
    else {
        print "Good: $s\n";
    }
}

Output:

E:\srv\unur> j
Good: bar foo
Good: hello bar
Good: foo is bar
Good: barbar
Good: foo foo
Good: foo     bar
Good: foobar
Sinan Ünür
Won't that match even if the string does not contain 'bar'?
Mark Byers
@Mark Byers: Thank you for pointing out my oversight. Fixed.
Sinan Ünür
'bar foobar' also makes an interesting test case. I'm not sure what the expected output is here though.
Mark Byers
Personally, when I find a pattern hard to match using regex; I need to go learn more regex, or get a refresher. I think that making an inflexible lookup table when it is not needed is no way to grow as a programmer.
J.J.
Personally, I think you should read the code before downvoting. The look up table is there to list the test cases and make it easy to add test cases: The table has nothing to do with the logic. The logic consists entirely of `$s =~ /bar/ and not $s =~ /foo\s*bar/`.
Sinan Ünür