tags:

views:

91

answers:

3

I am trying to convert a Perl script to a C# 3.5 routine.

The perl code I have is:

if($work =~ /\<[0-9][0-9][0-9]\>/){
    $left       = $`;
    $match      = $&;
    $work       = $';
}

In C# I wrote the following code:

string[] sSplit = Regex.Split(work, @"\<[0-9][0-9][0-9]\>");
if sSplit.length is > 2
{
    left = sSplit[0];
    match = sSplit[1];
    work = sSPlit[2];
}

However the above is not giving me the matched pattern in sSplit[1], but the content to the right of the matched string instead.

+4  A: 

Regex.Split is not what you need. The equivalent to =~ /.../ is Regex.Match.

However, Regex.Match has no equivalent to Perl’s $` or $', so you need to use a workaround, but I think it’s a fair one:

var m = Regex.Match(work, @"^(.*?)(\<[0-9][0-9][0-9]\>)(.*)$", RegexOptions.Singleline);
if (m.Success)
{
    left = m.Groups[0].Value;
    match = m.Groups[1].Value;  // perhaps with Convert.ToInt32()?
    work = m.Groups[2].Value;
}

Alternatively, you can use the match index and length to get the stuff:

var m = Regex.Match(work, @"^\<[0-9][0-9][0-9]\>");
if (m.Success)
{
    left = work.Substring(0, m.Index);
    match = m.Value;  // perhaps with Convert.ToInt32()?
    work = work.Substring(m.Index + m.Length);
}
Timwi
Would you not want either (.*?) on the end too or (.*) on both? I'm just wondering what happens if the matching string repeats itself and you want to get all matches and what's to their left and right... Though that is a bit beyond the scope of this question as I read it.
Chris
@Chris: Whether the second one is `(.*?)` or `(.*)` makes no difference because of the `$` after it. The first one is important: it is `(.*?)` so that you get the first match. With `(.*)` you would get the last.
Timwi
everyone, thanks a lot for all your help!
Desai
+1  A: 

When trying regular expressions, I always recomment RegexHero, which is an online tool that visualizes your .NET regular expressions. In this case, use Regex.Match and use Groups. That'll give what you want.

Note that the backslash in \< and \> are not needed in C# (nor in Perl, btw).

Also note that $`, $& and $' have equivalents in C# when used in a replacement expression. If that's what you need in the end, you can use these "magic variables", but only in Regex.Replace.

Abel
+1  A: 

A split is usually asking to throw away the delimiters. Perl acts just the same way (without the verboten $& type variables.)

You capture delimters in Perl by putting parens around them:

my @parts = split /(<[0-9][0-9][0-9]>)/; # includes the delimiter
my @parts = split /<[0-9][0-9][0-9]>/;   # doesn't include the delimiter
Axeman
s/verboten/non-recommended/
ysth