ansaurus

Question

Answer 1

+5 A:

First of all, get a good regular expression development tool. My favorite is Expresso.

Here is a cleaned up version:

^[\w# ]+ - [a-zA-Z ]*(?:[\w_ ]+ - ){9}[a-zA-Z]+ ~[\w_ ]+\.rpt$

Changes include:

Removed the capture groupings "()" - I'll assume that you're only validating the text since you didn't mention any capturing. If you need them, they're easy enough to add back
Use of alphanumeric character class - "\w" which is equivalent to "[a-zA-Z0-9]"
Replaced the repeated portion in the middle with "(?:[\w_ ]+ - ){9}" This matches ([alphanumeric underscore space]+ - ) nine times. It doesn't capture because of the "?:" I put after the first parenthesis.

EDIT:

Here it is with the capture groups back:

^([\w# ]+) - ([a-zA-Z ]*)(?:([\w_ ]+) - ){9}([a-zA-Z]+) ~ ([\w_ ]+)\.rpt$

Note that when you go through the numbered capture groups, the third one will have 9 captures in it.

James Kolpack 2009-07-31 20:47:53

I need the capture groups.

Michael G 2009-07-31 20:50:18

Note that _ is also included in \w

rob 2009-07-31 21:21:32

Answer 2

+3 A:

You can replace every instance of a-zA-Z0-9_ with \w. Also, 0-9 can be slightly shortened to \d.

Here are the character classes supported by C#: http://msdn.microsoft.com/en-us/library/20bw873z%28VS.71%29.aspx

You can make a group non-capturing by including ?: at the beginning after the opening parenthesis for the group. If you have an expression that repeats a known number of times, you can follow it with {n}:

^([a-zA-Z\d# ]+)-([a-zA-Z ]*)(?:([\w ]+)-){9}([a-zA-Z ~]+)([\w ]+)\.rpt$

rob 2009-07-31 20:57:31

Answer 3

+4 A:

When you are talking about "simplifying" Regular Expressions, you really need to also know what you don't want to match, as that can really help simplify your tests with special characters, sequence repetition, etc.

That said, here is a cleaned up version that is produces exactly the same result as your original expression:

^([a-zA-Z0-9# ]+)-([a-zA-Z ]*)(?:([\w ]+)-){9}([a-zA-Z ~]+)([\w ]+)\.rpt$

Some notes on why this differs from the other posted answer:

According to my reference for Perl-compatible regular expressions, \w actually also includes underscore. (Edit: this is apparently different from C# which is explained in the link to MSDN. This difference may be useful to note.)
My expression assumes you had the spaces in the character classes on purpose. If, in fact, you can have multiple spaces between dashes, leave it this way, otherwise, go with the other answer.

Renesis 2009-07-31 20:59:44

`\w` includes the underscore in C#, too (it's covered by the Unicode property `\p{Pc}`).

Alan Moore 2009-08-01 01:30:53

ansaurus

tags:

views:

answers:

Long Regex - How would you do it?

related questions