tags:

views:

99

answers:

3

I've got a RegEx that works great on *NIX systems and languages that support Extended Regular Expressions (ERE). I haven't found a freely available library for .NET that supports ERE's, nor have I had any lucky trying to translate this into a non-ERE and get the same result. Here is the RegEx:

^\+(<{7} \.|={7}$|>{7} \.)

Background: the point of the RegEx is to identify if a given string appears to have the markers from an unresolved subversion merge.

A: 

Are you sure you don't have a typo in that? RegexBuddy (when set to either POSIX ERE or GNU ERE) says that the "+" quantifier must be preceded by a token that can be repeated. Other than that, this appears to be a valid .NET Regex. You might want to check out one of the great O'Reilly books on regular expressions as well. If this doesn't help, please post some examples of text you're trying to match/not match.

TrueWill
It's not a typo, the OP just didn't use code formatting, so the SO software ate some of the characters.
Alan Moore
A: 

Actually according to what I've read here:

Extended Regular Expressions

It would appear that C# is basically using ERE - just a slightly different syntax.

However, if that was true, then from looking at your expression - it looks like you've made a group named "{7} .|={7}$|" that looks for anything that starts with a 7 followed by any character - and also an invalid + sign at the beginning of your statement - sooooo I'm guessing the stuff I'm finding via google searches are not the same ERE you are talking about :(

However! I have a site for you that should have just about everything you need to recreate your expression into a .net compatible one:

Regular Expressions in .net

Hope that link helps!

DataDink
Check the question again; I added code formatting, so the regex makes much more sense now.
Alan Moore
This regex is straight from a Subversion 1.6 installation. It works fine on *NIX systems, including my Mac OS machine, but the regex does not work if used in a C# application. The purpose of it is to identify if a diff between 2 revisions contains any of the characters used in a merge conflict file (<,=,>) 7 times in a row, with any 2 or more sets of them in a file (in POSIX ERE, best I can tell, the '\+' proceeding causes the group to match only if more than 1 of the OR conditions in the group matches. Testing on my Mac seems to confirm this)
+1  A: 

It looks to me like ERE syntax is mostly upward-compatible with .NET's regex flavor, as it is with most other "Perl-compatible" flavors (Perl, PHP, Python, JavaScript, Ruby, Java...). In other words, anything you can do in an ERE regex, you should be able to do in an identical .NET regex. Certainly your example:

^\+(<{7} \.|={7}$|>{7} \.)

means the same thing in .NET as it does in ERE. The only major exception I can see is in the area of POSIX bracket expressions; .NET follows the Unicode standard instead.

It's when you go to apply the regex that things really get different. In C# you might apply that regex like this:

string result = Regex.Match(targetString, @"^\+(<{7} \.|={7}$|>{7} \.)").Value;

C#'s verbatim strings save you having to escape backslashes like in some other languages' string literals; you only have to escape quotation marks, which you do by doubling them:

@"He said, ""Look out!""";

Does that answer your question?

Alan Moore
That didn't answer it, but I did learn something new - I hadn't realized C# allowed the double-quote method of escaping quotes. I was using the "@" to declare the string literal.