tags:

views:

66

answers:

1

I'm using a simple regular expression on some text data. The expression seems to work fine. The problem I need to get around now, is to only return one result per match as seen in desired matches below. Of course the data would be much larger than this.

Example Data:

stuffbefore233/stuffafter
stuffbefore233/stuffafter
stuffbefore111/stuffafter
stuffbefore111/stuffafter

Regular Expression:

(?<=stuffbefore)[\d]+(?=/stuffafter)

Current matches: 233, 233, 111, 111

Desired matches: 233, 111

I hope that makes sense. Please let me know if you need any more information.

+1  A: 

First, it should be very simple to filter the results to remove duplicates. For example:

MatchCollection matches = Regex.Matches(s, @"(?<=stuffbefore)\d+(?=/stuffafter)");
IEnumerable<string> a = matches.Cast<Match>().Select(m => m.Value).Distinct();

If you insist on a regex to solve it (which I suspect is less performant and maintainable), you can use another look ahead to check if the number repeats again. Here, I've added a capturing group around the number so I can use back-reference (\1). This finds the last match for every number (just because it is easier to combine a lookahead with back-reference):

(?<=stuffbefore)(\d+)(?=/stuffafter)(?!.*stuffbefore\1/stuffafter)
Kobi
One extra note, after editing the question - you'll need to use `RegexOptions.Singleline` if your test contains multiple lines.
Kobi