tags:

views:

91

answers:

2

I'm performing regex matching in .NET against strings that look like this:

1;#Lists/General Discussion/Waffles Win
2;#Lists/General Discussion/Waffles Win/2_.000
3;#Lists/General Discussion/Waffles Win/3_.000

I need to match the URL portion without the numbers at the end, so that I get this:

Lists/General Discussion/Waffles Win

This is the regex I'm trying:

(?:\d+;#)(?<url>.+)(?:/\d+_.\d+)*

The problem is that the last group is being included as part of the middle group's match. I've also tried without the * at the end but then only the first string above matches and not the rest.

I have the multi-line option enabled. Any ideas?

+2  A: 

A few different alternatives:

@"^\d+;#([^/]+(?:/[^/]+)*?)(?:/\d+_\.\d+)?$"

This matches as few path segments as possible, followed by an optional last part, and the end of the line.

@"^\d+;#([^/]+(?:/(?!\d+_\.\d+$)[^/]+)*)"

This matches as many path segments as possible, as long as it is not the digit-part at the end of the line.

@"^\d+;#(.*?)(?:/\d+_\.\d+)?$"

This matches as few characters as possible, followed by an optional last part, and the end of the line.

MizardX
Thank you, last one works. Never thought of matching the end line itself - I think that's the major difference.
Alex Angas
The difference is that it uses a lazy quantifier (`.*?`) instead of a greedy one (`.+`).
MizardX
I need to read up on these. Thank you!
Alex Angas
A: 

You could try

^(\d+;#)([^/]+(/[^\d][^/]*)*)

and get the 2nd group. The first group matches the 1;#; the second group is split into the first part or the URL (assumed to contain any character other than /), then match any number of groups of /, followed by a non-digit, followed by anything other than /.

Tested on this site, appears to do what you want. Give it a try with some more samples.

Vinay Sajip