tags:

views:

58

answers:

3

Need to match the first part of a sentence, up to a given word. However, that word is optional, in which case I want to match the whole sentence. For example:

I have a sentence with a clause I don't want.

I have a sentence and I like it.

In the first case, I want "I have a sentence". In the second case, I want "I have a sentence and I like it."

Lookarounds will give me the first case, but as soon as I try to make it optional, to cover the second case, I get the whole first sentence. I've tried making the expression lazy... no dice.

The code that works for the first case:

var regEx = new Regex(@".*(?=with)");
string matchstr = @"I have a sentence with a clause I don't want";

if (regEx.IsMatch(matchstr)) {
    Console.WriteLine(regEx.Match(matchstr).Captures[0].Value);
    Console.WriteLine("Matched!");
}
else {
    Console.WriteLine("Not Matched : (");
}

The expression that I wish worked:

var regEx = new Regex(@".*(?=with)?");


Any suggestions?

Thanks in advance!
James

+1  A: 

If I understand your need correctly, you want to match either the sentence up to the word 'with', or, if it's not there, match the entire thing? Why not write the regexp to explicitly look for the two cases?

/(.*) with |(.*)/

Wouldn't this get both cases?

zigdon
This also totally works... I changed it to `(.*)(?=with)|.*` to exclude the word with. Definitely upvoted!
James B
+1  A: 

There are several ways to do this. You could do something like this:

^(.*?)(with|$)

The first group is matched reluctantly, i.e. as few characters as possible. We have an overall match if this group is followed by either with or the end of the line $ anchor.

Given this input:

I have a sentence with a clause I don't want.
I have a sentence and I like it.

Then there are two matches (as seen on rubular.com):

  • Match 1:
    • Group 1: "I have a sentence "
    • Group 2: "with"
  • Match 2:
    • Group 1: "I have a sentence and I like it".
    • Group 2: "" (empty string)

You can make the grouped alternation non-capturing with (?:with|$) if you don't need to distinguish the two cases.

Related questions

polygenelubricants
You can of course use no capturing group, and use lookahead for the alternation part, i.e. `^.*?(?=with|$)` http://www.rubular.com/r/1JVjxdk30T ; these are minor variations of the same basic idea.
polygenelubricants
Beautiful. Used this with the non-capturing group, but (?:) still captured the group for some reason... `(?=with|$)`, however, did exactly what I needed it to do. Thanks!
James B
@James: there's a difference between non-capturing and assertion. Assertion doesn't consume as part of the match. Non-capturing doesn't mean non-matching. It's still matched, but it's not captured into a group.
polygenelubricants
Hmm, not sure I understand... I put `(.*?)(?:with|$)` into my code and got back one captured group: `I have a sentence with` Why is the word 'with' included in this capture?
James B
@James: I'm guessing you used `Captures[0]` when I meant `Groups[1]`. See http://stackoverflow.com/questions/3320823/whats-the-difference-between-groups-and-captures-in-net-regular-expressions
polygenelubricants
Yep, I did! And I knew better, too :P Though it still isn't clear to me why `Groups[0]` returns `I have a sentence with`, and why `Groups[1]` returns `I have a sentence` when I use `^(.*?)(?:with|$)`
James B
@James: because `Groups[0]` is the "default" group that returns the matched string. There's no explicit brackets needed to capture for group 0. Whatever you matched is what it will contain. Using `?:` creates no new group, but it's still a regular match, so it will be included in group 0.
polygenelubricants
+1  A: 
string optional = "with a clause I don't want" 
string rx = "^(.*?)" + Regex.Escape(optional) + ".*$";

// displays "I have a sentence"
string foo = "I have a sentence with a clause I don't want.";
Console.WriteLine(Regex.Replace(foo, rx, "$1"));

// displays "I have a sentence and I like it."
string bar = "I have a sentence and I like it.";
Console.WriteLine(Regex.Replace(bar, rx, "$1"))

If you don't need the complex matching provided by a regex then you could use a combination of IndexOf and Remove. (And obviously you could abstract the logic away into a helper and/or extension method or similar):

string optional = "with a clause I don't want" 

// displays "I have a sentence"
string foo = "I have a sentence with a clause I don't want.";
int idxFoo = foo.IndexOf(optional);
Console.WriteLine(idxFoo < 0 ? foo : foo.Remove(idxFoo));

// displays "I have a sentence and I like it."
string bar = "I have a sentence and I like it.";
int idxBar = bar.IndexOf(optional);
Console.WriteLine(idxBar < 0 ? bar : bar.Remove(idxBar));
LukeH
This looks like it could work... giving it an upvote for a different way of doing it. Thanks!
James B