views:

4341

answers:

16

I am looking for a regex that will match a string that starts with one substring and does not end with a certain substring.

Example:

// Updated to be correct, thanks @Apocalisp
^foo.*(?<!bar)$

Should match anything that starts with "foo" and doesn't end with "bar". I know about the [^...] syntax, but I can't find anything that will do that for a string instead of single characters.

I am specifically trying to do this for Java's regex, but I've run into this before so answers for other regex engines would be great too.

Thanks to @Kibbee for verifying that this works in C# as well.

A: 

I had a similar problem today, asked in this question. The word you're looking for is negative look-ahead.

Erik van Brakel
A: 

You'll want to use Negative lookahead.

foo(?!bar)

should do the trick.

Kibbee
+1  A: 

I'm not familiar with Java regex but documentation for the Pattern Class would suggest you could use (?!X) for a non-capturing zero-width negative lookahead (it looks for something that is not X at that postision, without capturing it as a backreference). So you could do:

foo.*(?!bar) // not correct

Update: Apocalisp's right, you want negative lookbehind. (you're checking that what the .* matches doesn't end with bar)

Sam Hasler
A: 

As other commenters said, you need a negative lookahead. In Java you can use this pattern:

"^first_string(?!.?second_string)\\z"
  • ^ - ensures that string starts with first_string
  • \z - ensures that string ends with second_string
  • (?!.?second_string) - means that first_string can't be followed by second_string
aku
+4  A: 

I think in this case you want negative lookbehind, like so:

foo.*(?<!bar)
Apocalisp
+1  A: 

Verified @Apocalisp's answer using:

import java.util.regex.Pattern;
public class Test {
  public static void main(String[] args) {
    Pattern p = Pattern.compile("^foo.*(?<!bar)$");
    System.out.println(p.matcher("foobar").matches());
    System.out.println(p.matcher("fooBLAHbar").matches());
    System.out.println(p.matcher("1foo").matches());
    System.out.println(p.matcher("fooBLAH-ar").matches());
    System.out.println(p.matcher("foo").matches());
    System.out.println(p.matcher("foobaz").matches());
  }
}

This output the the right answers:

false
false
false
true
true
true
John Meagher
A: 

@John Meagher,

try these test strings:

  • 1foobar
  • foobar2
  • foo bar

Notice that you're using lookahead, not lookbehind

aku
A: 

@aku Ok, so it probably needs a ^ and $ in there to really be right, but the negative lookbehind part is what I was having trouble with. I'll update it.

The look ahead was just a typo, my tester was right. Remember to use copy and not retype it.

John Meagher
A: 

@John Meagher, Also .* part is not correct, try to match foo bar (whitespace between two parts)

aku
A: 

@aku Actually the .* is intentional. I want it to match on starts with and doesn't end with, but anything can be in the middle.

John Meagher
A: 

@aku Verified that "foo bar" does not match when using

^foo.*(?<!bar)$

as the regex under Java 1.6.

John Meagher
A: 

I think the problem with

foo.*(?<!bar)

Is that it would actually match "foobar". Testing it out, it actually captures "fooba", because it captures "foo", and using the .* it captures "ba", and then r is not bar, so the negative look ahead passes without problems. I tried out a couple of variations, and couldn't come up with a good solution on how to solve this.

Kibbee
A: 

@Kibbee Are you using Java to test? I don't get a match on "foobar".

John Meagher
A: 

I'm using .Net

System.Text.RegularExpressions.Regex.Match("foobar","foo.*(?<!bar)").Value

gives a result of "fooba"

Kibbee
A: 

@Kibbee Try adding the ^ and $ to the beginning and end. That may do the trick. If not we may have stumbled on one of those odd differences between regular expression engines.

John Meagher
A: 

Oh, yeah, adding ^ and $ fixes it. Seems the question has been changed, or I may have misread it. But I thought you were looking for references of foo that weren't followed by bar anywhere else in the string. What you were actually looking for is a string that starts with foo, and ends with anything except bar.

Kibbee