tags:

views:

685

answers:

2

Hey, first time poster on this awesome community.

I have a regular expression in my C# application to parse an assignment of a variable:

NewVar = 40

which is entered in a Textbox. I want my regular expression to return (using Regex.Split) the name of the variable and the value, pretty straightforward. This is the Regex I have so far:

var r = new Regex(@"^(\w+)=(\d+)$", RegexOptions.IgnorePatternWhitespace);
var mc = r.Split(command);

My goal was to do the trimming of whitespace in the Regex and not use the Trim() method of the returned values. Currently, it works but it returns an empty string at the beginning of the MatchCollection and an empty string at the end.

Using the above input example, this is what's returned from Regex.Split:

mc[0] = ""
mc[1] = "NewVar"
mc[2] = "40"
mc[3] = ""

So my question is: why does it return an empty string at the beginning and the end?

Thanks.

+1  A: 

From the docs, Regex.Split() uses the regular expression as the delimiter to split on. It does not split the captured groups out of the input string. Also, the IgnorePatternWhitespace ignore unescaped whitespace in your pattern, not the input.

Instead, try the following:

var r = new Regex(@"\s*=\s*");
var mc = r.Split(command);

Note that the whitespace is actually consumed as a part of the delimiter.

jheddings
I don't really see an advantage of using Regex.Split over String.Split for this code. Is this what the OP intended?
Juliet
@Juliet the OP was hoping to avoid using `Trim()` to remove extra whitespace around the `=` sign. It's not as efficient to execute, but it does save the extra lines of code, I suppose.
jheddings
Your edited Regex works. Yes, I was hoping to save a few lines of code. Also, your Regex will work if I have multiple assignments: NewVar = OldVar = 5. Thanks.
AlexDemers
+3  A: 

The reson RegEx.Split is returning four values is that you have exactly one match, so RegEx.Split is returning:

  • All the text before your match, which is ""
  • All () groups within your match, which are "NewVar" and "40"
  • All the text after your match, which is ""

RegEx.Split's primary purpose is to extract any text between the matched regex, for example you could use RegEx.Split with a pattern of "[,;]" to split text on either commas or semicolons. In NET Framework 1.0 and 1.1, Regex.Split only returned the split values, in this case "" and "", but in NET Framework 2.0 it was modified to also include values matched by () within the Regex, which is why you are seeing "NewVar" and "40" at all.

What you were looking for is Regex.Match, not Regex.Split. It will do exactly what you want:

var r = new Regex(@"^(\w+)=(\d+)$");
var match = r.Match(command);
var varName = match.Groups[0].Value;
var valueText = match.Groups[1].Value;

Note that RegexOptions.IgnorePatternWhitespace means you can include extra spaces in your pattern - it has nothing to do with the matched text. Since you have no extra whitespace in your pattern it is unnecesssary.

Ray Burns