views:

375

answers:

7

What is the most efficient way to parse a C# string in the form of

"(params (abc 1.3)(sdc 2.0)(www 3.05)....)"

into a struct in the form

struct Params
{
  double abc,sdc,www....;
}

Thanks

EDIT The structure always have the same parameters (same names,only doubles, known at compile time).. but the order is not granted.. only one struct at a time..

+2  A: 

Do you need to support multiple structs ? In other words, does this need to be dynamic; or do you know the struct definition at compile time ?

Parsing the string with a regex would be the obvious choice.

Here is a regex, that will parse your string format:

private static readonly Regex regParser = new Regex(@"^\(params\s(\((?<name>[a-zA-Z]+)\s(?<value>[\d\.]+)\))+\)$", RegexOptions.Compiled);

Running that regex on a string will give you two groups named "name" and "value". The Captures property of each group will contain the names and values.

If the struct type is unknown at compile time, then you will need to use reflection to fill in the fields.

If you mean to generate the struct definition at runtime, you will need to use Reflection to emit the type; or you will need to generate the source code.

Which part are you having trouble with ?

driis
If performance is critical then RegEx should not be the first choice. They don't perform nearly as well as simple string operations such as Split and trim
Rune FS
A: 

Do you want to build a data representation of your defined syntax?

If you are looking for easily maintainability, without having to write long RegEx statements you could build your own Lexer parser. here is a prior discussion on SO with good links in the answers as well to help you

http://stackoverflow.com/questions/673113/poor-mans-lexer-for-c

Glennular
+1  A: 

Depending on your complete grammar you have a few options: if it's a very simple grammar and you don't have to test for errors in it you could simply go (which will be fast

var input = "(params (abc 1.3)(sdc 2.0)(www 3.05)....)";
var tokens = input.Split('(');
var typeName = tokens[0];
//you'll need more than the type name (assembly/namespace) so I'll leave that to you
Type t = getStructFromType(typeName);
var obj = TypeDescriptor.CreateInstance(null, t, null, null);
for(var i = 1;i<tokens.Length;i++)
{
    var innerTokens = tokens[i].Trim(' ', ')').Split(' ');
    var fieldName = innerTokens[0];
    var value = Convert.ToDouble(innerTokens[1]);
    var field = t.GetField(fieldName);
    field.SetValue(obj, value);
}

that simple approach however requires a well conforming string or it will misbehave.

If the grammar is a bit more complicated e.g. nested ( ) then that simple approach won't work.

you could try to use a regEx but that still requires a rather simple grammar and ain't really performing that well and if you end up having a complex grammar you're best choice is a real parser. Irony is easy to use since you can write it all in simple c# (some knowledge of BNF is a pro though).

Rune FS
+3  A: 
using System;

namespace ConsoleApplication1
{
    class Program
    {
        struct Params
        {
            public double abc, sdc;
        };

        static void Main(string[] args)
        {
            string s = "(params (abc 1.3)(sdc 2.0))";
            Params p = new Params();
            object pbox = (object)p; // structs must be boxed for SetValue() to work

            string[] arr = s.Substring(8).Replace(")", "").Split(new char[] { ' ', '(', }, StringSplitOptions.RemoveEmptyEntries);
            for (int i = 0; i < arr.Length; i+=2)
                typeof(Params).GetField(arr[i]).SetValue(pbox, double.Parse(arr[i + 1]));
            p = (Params)pbox;
            Console.WriteLine("p.abc={0} p.sdc={1}", p.abc, p.sdc);
        }
    }
}

Note: if you used a class instead of a struct the boxing/unboxing would not be necessary.

Simon Chadwick
+1 for using `String.Split`.
Brian
I think he wanted a dynamically built struct, possible via a Dictionary type object. (The example now includes 'www')
Glennular
@Glennular: His edit says the struct is fixed. But I agree with you anyway; I'd rather use a Dictionary<string, double> than reflection for something like this.
Simon Chadwick
+2  A: 

A regex can do the job for you:

public Dictionary<string, double> ParseString(string input){
    var dict = new Dictionary<string, double>();
    try
    {
        var re = new Regex(@"(?:\(params\s)?(?:\((?<n>[^\s]+)\s(?<v>[^\)]+)\))");
        foreach (Match m in re.Matches(input))
            dict.Add(m.Groups["n"].Value, double.Parse(m.Groups["v"].Value));
    }
    catch
    {
        throw new Exception("Invalid format!");
    }
    return dict;
}

use it like:

string str = "(params (abc 1.3)(sdc 2.0)(www 3.05))";
var parsed = ParseString(str);

// parsed["abc"] would now return 1.3

That might fit better than creating a lot of different structs for every possible input string, and using reflection for filling them. I dont think that is worth the effort.

Furthermore I assumed the input string is always in exactly the format you posted.

Philip Daubmeier
+1  A: 

You might consider performing just enough string manipulation to make the input look like standard command line arguments then use an off-the-shelf command line argument parser like NDesk.Options to populate the Params object. You give up some efficiency but you make it up in maintainability.

public Params Parse(string input)
{
    var @params = new Params();
    var argv = ConvertToArgv(input);
    new NDesk.Options.OptionSet
        {
            {"abc=", v => Double.TryParse(v, out @params.abc)},
            {"sdc=", v => Double.TryParse(v, out @params.sdc)},
            {"www=", v => Double.TryParse(v, out @params.www)}
        }
        .Parse(argv);

    return @params;
}

private string[] ConvertToArgv(string input)
{
    return input
        .Replace('(', '-')
        .Split(new[] {')', ' '});
}
Handcraftsman
A: 

I would just do a basic recursive-descent parser. It may be more general than you want, but nothing else will be much faster.

Mike Dunlavey