tags:

views:

114

answers:

4

I've used regex in the past for input validation, but I am wondering if they can let you parse a complex string.

I have a header like this:

-----------------------------7dac1d2214d4\r\nContent-Disposition: form-data; name=\"my_title\"\r\n\r\nMyData\r\n-----------------------------7dac1d2214d4\r\nContent-Disposition: form-data; name=\"myupload\"; filename=\"C:\\myfile.zip\"\r\nContent-Type: application/x-zip-compressed\r\n\r\n

I want to be able to parse out say, the filename.

At the moment I am doing this (after parsing headers):

this.FileName = headers[1].Substring(headers[1].IndexOf("filename=\"") + "filename=\"".Length, headers[1].IndexOf("\"\r\n", headers[1].IndexOf("filename=\"")) - (headers[1].IndexOf("filename=\"") + "filename=\"".Length));

But it's hideous and ugly.

Can regex solve this problem more elegently? I understand the basics of the syntax, so if it can solve it, could someone show me how to parse this with regex:

"+Name=Bob+Age=39+"

I can probably work out the rest myself then.

Thanks.

+4  A: 

Named matched subexpressions is what best suits your needs. (?<Name>Expression) allows you to access the string matching the expression Expression via the specified group name Name.

var input = "Foo=42;Bar='FooBar'";

var regex = new Regex(@"Foo=(?<Foo>[0-9]+);Bar='(?<Bar>[^']+)'");

var match = regex.Match(input);

Console.WriteLine(match.Groups["Foo"]); // Prints '42'.
Console.WriteLine(match.Groups["Bar"]); // Prints 'FooBar'.
Daniel Brückner
@Daniel -- Curses you beat me to it :) Nice to see a fellow Regex ninja
Josh
I am amazed how similar the two answers are... :)
Daniel Brückner
+2  A: 

Using Named Capturing Groups you should be able to parse just about anything and later refer to it by name.

var inputString = "+Name=Bob+Age=39+";
var regex = new Regex("Name=(?<Name>[A-Z][a-z]*)\\+Age=(?<Age>[0-9]*)");

var match = regex.Match(inputString);

System.Console.WriteLine("Name: {0}", match.Groups["Name"]);
System.Console.WriteLine("Age: {0}", match.Groups["Age"]);

System.Console.ReadKey();
Josh
Cool, can you explain how you divide the two searches? You are using \\+ does that represent something? Does this regex search care about order?
SLC
The "\\" is just an escape sequence. "+" is a special character in Regex meaning (at least one character). In order to treat it literally I had to escape it. Since "\" indicates an escape sequence in C# strings, I had to escape that with another "\" :) Clear as mud?
Josh
Yes, order is important in this particular Regex. Remember Regex is basically a scanner; much like a compiler uses. There are such things as backtracking, and back references, but for the most part Regular Expression engines scan from left to right.
Josh
Aha I just realised now what's going on. Is there a way to change \\+ into something that says, there might be some random other data in here (such as other properties which we want to ignore)? Is it \S* instead?
SLC
`.*` (or maybe `.*?`) will match everything in between. You have to specify dot-all-mode if there may be newlines.
Daniel Brückner
A: 

Give this a try: (?<=filename\=\").*(?=\")

Greets Flo

Florian Reischl
A: 

I think what you're looking for are Grouping Constructs, which allow you to extract parts of the regex. So using your simplified example:

string input = @"+Name=Bob+Age=39+";
Regex regex = new Regex(@"Name=(?<Name>[^\+]+)\+Age=(?<Age>[^\+]+)");

foreach (Match match in regex.Matches(input))
{
    Console.WriteLine("Name = '{0}'", match.Groups["Name"]);
    Console.WriteLine("Age  = '{0}'", match.Groups["Age"]);
}
Sam