views:

260

answers:

3

This is supposedly a very easy question, but I just can't seem to find the right solution. There is a string in the format:

A:B=C;D:E=F;G:E=H;...

whereas A, B and C are alphanumeric (and may be lower as well as upper case). A and B are of length 1+, C may be empty.

I figured I'd have to use something along the lines of

((?<A>.+):(?<B>.+)=(?<C>.*);)*

but I don't see how to make it match in a lazy way. I'm also not sure how to obtain the results of the respective matches so I'd appreciate a little code sample.

I'd be glad if you could give me a hint.

+4  A: 

You might use \w to match alphanumeric characters rather than ., which matches everything. Then, you might try to capture one match at a time:

(?<A>\w+):(?<B>\w+)=(?<C>\w*);

Here's a small example:

Regex regex = new Regex("(?<A>\\w+):(?<B>\\w+)=(?<C>\\w*);");
string test = "A:B=C;D:E=F;G:E=H";

// get all matches
MatchCollection mc = regex.Matches(test);

foreach (Match m in mc) { 
    Console.WriteLine("A = {0}", m.Groups["A"].Value);
    Console.WriteLine("B = {0}", m.Groups["B"].Value);
    Console.WriteLine("C = {0}", m.Groups["C"].Value);
}

note: there are several tools that allow you to experiment with regular expressions and also provide some sort of help; I personally like Expresso - try it out, it will be very useful for learning.

Paolo Tedesco
Thanks for the link, I'll have a look!
mafutrct
+2  A: 
Regex r = new Regex("(?<A>\\w)+:(?<B>\\w)+=(?<C>\\w)*;");

The \w will match alphanumerics and underscore, equivalent to [a-zA-Z0-9_].

The backslash is escaped in the string, so it becomes \ \w.

The regex captures groups A, B, and C, and will match 0 or more elements of C or entire groups delimited by the semicolon.

You will have multiple Match objects:

MatchCollection m = r.Matches(sampleInput);
// m[0] will contain A:B=C;
// m[1] will contain D:E=F;
// m[2] will contain G:E=H;
// ...
Jeff Meatball Yang
+5  A: 

Is regex a requirement? Since the string has a very structured, well, structure, it is easy to parse it without regex:

string input = "A:B=C;D:EF=G;E:H=;I:JK=L";
string[] elements = input.Split(new[] { ';' });
List<string[]> parts = new List<string[]>();
foreach (string element in elements)
{
    parts.Add(element.Split(new[] { ':', '=' }));
}
// result output
foreach (string[] list in parts)
{
    Console.WriteLine("{0}:{1}={2}", list[0], list[1], list[2]);
}

The output will be:

A:B=C
D:EF=G
E:H=
I:JK=L
Fredrik Mörk
Why do all that splitting and messing with Lists when a single line with Regex will give you a collection of Match objects, already labeled with groups?
Jeff Meatball Yang
It's not a requirement. I was actually using string.split but it seemed a bit bloated.
mafutrct
Just presenting an alternate way; I sometimes see regex used when it's not needed (guilty of that myself). Sometimes not bringing a new language to the table can be a good thing.
Fredrik Mörk
+1 I agree. String manipulation so simple, needs no regex.
Rashmi Pandit
I tend to agree as well. My own Regex-fu is weak enough to prefer using string manipulation techniques.
Erik Forbes