tags:

views:

67

answers:

3

Currently this regex returns one match:

the best language in the world and the fastest language

How can I get it to return two matches:

the best language

the fastest language

string text = "C# is the best language in the world and the fastest language in the world";
string search = @"\bthe\b.*\blanguage\b";
MatchCollection matches = Regex.Matches(text, search);
Console.WriteLine("{0} matches", matches.Count);
foreach (Match match in matches)
{
    Console.WriteLine("match '{0}' was found at index {1}", match.Value, match.Index);
}
Console.WriteLine("---");
+4  A: 

Add ? after the *

Jay
thanks, that almost works: it gets "the best language" and "the world and the fastest language"
Edward Tanguay
could you also explain why adding ? does what it does, I understand ? to mean "repeat zero or one time", would not have come upon that in solving this
Edward Tanguay
@Edward `?` after `+`, `*` and `{m,n}` make them non greedy - match the smallest string possible. The second match `the world and the fastest language` is inevitable with your `.*?` - be more specific like `[^ ]*?` to match any non space character or something like that
Amarghosh
@Edward To clarify Amarghosh's comment, which is spot on, `?` by itself does exactly what you thought, but here we're using it in concert with `*` -- together `*?` `+?` etc. are [non-greedy] operators in their own right. Also, the longer-than-expected second match happens because matching starts when it finds `the`, rather than finding `language` and working backward.
Jay
using `[^ ]*?` returns 0 matches, but that doesn't seem the right approach since it's not the kind of characters that I want to determine (space/non-space) but the *closest* first word "the" to the second word "language", what would be a pattern for that?
Edward Tanguay
+1  A: 

this will match your requirement

/the (?:best|fastest) language/g
M42
+3  A: 

Try this:

\bthe\b(?:(?!\bthe\b).)*\blanguage\b

It uses a negative lookahead assertion to require that "the" is not seen again between the matching "the" and "language".

Justin