tags:

views:

413

answers:

4

In a program I'm reading in some data files, part of which are formatted as a series of records each in square brackets. Each record contains a section title and a series of key/value pairs.

I originally wrote code to loop through and extract the values, but decided it could be done more elegantly using regular expressions. Below is my resulting code (I just hacked it out for now in a console app - so know the variable names aren't that great, etc.

Can you suggest improvements? I feel it shouldn't be necessary to do two matches and a substring, but can't figure out how to do it all in one big step:

string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";

MatchCollection matches=Regex.Matches(input, @"\[[^\]]*\]");
foreach (Match match in matches)
{
    string subinput = match.Value;

    int firstSpace = subinput.IndexOf(' ');
    string section = subinput.Substring(1, firstSpace-1);
    Console.WriteLine(section);

    MatchCollection newMatches = Regex.Matches(subinput.Substring(firstSpace + 1), @"\s*(\w+)\s*=\s*(\w+)\s*");
    foreach (Match newMatch in newMatches)
    {
        Console.WriteLine("{0}={1}", newMatch.Groups[1].Value, newMatch.Groups[2].Value);
    }
}
+2  A: 

You should be able to do something with nested groups like this:

pattern = @"\[(\S+)(\s+([^\s=]+)=([^\s\]]+))*\]"

I haven't tested it in C# or looped through the matches, but the results look right on rubular.com

Don Kirkby
+1 for the link.
Jeff Meatball Yang
+5  A: 

You should take advantage of the collections to get each key. So something like this then:

        string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";

        Regex r = new Regex(@"(\[(\S+) (\s*\w+\s*=\s*\w+\s*)*\])", RegexOptions.Compiled);

        foreach (Match m in r.Matches(input))
        {
            Console.WriteLine(m.Groups[2].Value);
            foreach (Capture c in m.Groups[3].Captures)
            {
                Console.WriteLine(c.Value);
            }
        }

Resulting output:

section1
key1=value1
key2=value2
section2
key1=value1
key2=value2
key3=value3
section3
key1=value1
patjbs
A: 

This will match all the key/value pairs ...

var input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";

var ms = Regex.Matches(input, @"section(\d+)\s*(\w+=\w+)\s*(\w+=\w+)*");

foreach (Match m in ms)
{
    Console.WriteLine("Section " + m.Groups[1].Value);

    for (var i = 2; i < m.Groups.Count; i++)
    {
        if( !m.Groups[i].Success ) continue;
        var kvp = m.Groups[i].Value.Split( '=' );
        Console.WriteLine( "{0}={1}", kvp[0], kvp[1] );
    }
}
JP Alioto
+6  A: 

I prefer named captures, nice formatting, and clarity:

string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
MatchCollection matches = Regex.Matches(input, @"\[
                                                    (?<sectionName>\S+)
                                                      (\s+                                                            
                                                         (?<key>[^=]+)
                                                          =
                                                         (?<value>[^ \] ]+)                                                    
                                                      )+
                                                  ]", RegexOptions.IgnorePatternWhitespace);

foreach(Match currentMatch in matches)
{
    Console.WriteLine("Section: {0}", currentMatch.Groups["sectionName"].Value);
    CaptureCollection keys = currentMatch.Groups["key"].Captures;
    CaptureCollection values = currentMatch.Groups["value"].Captures;

    for(int i = 0; i < keys.Count; i++)
    {
        Console.WriteLine("{0}={1}", keys[i].Value, values[i].Value);           
    }
}
Jeff Moser
Nice, I didn't know about using the IgnorePatternWhitespace option to let you format a regex like that. Thanks for the tip.
Don Kirkby
+1 again for the RegexOptions.IgnorePatternWhitespace yeah for readability
Jeff Meatball Yang
+1 I too prefer named captures. They make the code readable and easy to understand.
Rashmi Pandit
I also prefer named captures in practice, but sometimes use just numbers in the interest of brevity when answering a question. :)
patjbs