views:

164

answers:

4

Using C#, how would you go about converting a String which also contains newline characters and tabs (4 spaces) from the following format

A {
   B {
      C = D
      E = F
   }
   G = H
}

into the following

A.B.C = D
A.B.E = F
A.G = H

Note that A to H are just place holders for String values which will not contain '{', '}', and '=' characters. The above is just an example and the actual String to convert can contain nesting of values which is infinitely deep and can also contain an infinite number of "? = ?".

+6  A: 

You probably want to parse this, and then generate the desired format. Trying to do regex tranforms isn't going to get you anywhere.

Tokenize the string, then go through the tokens and build up a syntax tree. Then walk the tree generating the output.

Alternative, push each "namespace" onto a stack as you encounter it, and pop it off when you encounter the close brace.

Anon.
I'd go down the stack route too.
Tom Duckering
@Anon: Would appreciate if you can give some sample code for the stack method.
Lopper
+1 for this. Even though I have posted a regex based answer, I still think it's better to do it properly. This task pushes the limits of what is sensible to do with regex.
Mark Byers
You probably would end up using regex to do the tokenization anyway, though. So regex is not all that bad :-)
Moron
+1  A: 

Pseudocode for the stack method:

function do_processing(Stack stack)
    add this namespace to the stack;
    for each sub namespace of the current namespace
        do_processing(sub namespace)
    end
    for each variable declaration in the current namespace
        make_variable_declaration(stack, variable declaration)
    end
end
RCIX
+1  A: 

You can do this with regular expressions, it's just not the most efficient way to do it as you need to scan the string multiple times.

while (s.Contains("{")) {
    s = Regex.Replace(s, @"([^\s{}]+)\s*\{([^{}]+)\}", match => {
        return Regex.Replace(match.Groups[2].Value,
                             @"\s*(.*\n)",
                             match.Groups[1].Value + ".$1");
    });
}

Result:

A.B.C = D
A.B.E = F
A.G = H

I still think using a parser and/or stack based approach is the best way to do this, but I just thought I'd offer an alternative.

Mark Byers
+2  A: 

Not very pretty, but here's an implementation that uses a stack:

static string Rewrite(string input)
{
    var builder = new StringBuilder();
    var stack = new Stack<string>();
    string[] lines = input.Split('\n');
    foreach (var s in lines)
    {
        if (s.Contains("{") || s.Contains("="))
        {
            stack.Push(s.Replace("{", String.Empty).Trim());
        }
        if (s.Contains("="))
        {
            builder.Append(string.Join(".", stack.Reverse().ToArray()));
            builder.Append(Environment.NewLine);
        }
        if (s.Contains("}") || s.Contains("="))
        {
            stack.Pop();
        }
   }
   return builder.ToString();
}
Isaac Cambron