views:

163

answers:

3

I have an interesting problem. Let's say that i have file with lines filled like this:

name1[xp,y,z321](a,b,c){text};//comment
#comment
name2(aaaa);

also I have (simplified) class:

class something {
public:
 something(const std::string& name);
 addOptionalParam(const std::string& value);
 addMandatoryParam(const std::string& value);
 setData((const std::string& value);
};

name corresponds to param name of some class constructor. Things listed in [] brackets are optional, in () are mandatory and everything between {} should be pased as string.

For the first line one should call constructor with "name1" as name; 3 times call addOptionalParam, one once for each item separated with colon; also 3 times addMandatoryParam and setData with "text".

I can work out how to do the comments, but everything else is mangled for me...

Now I need some good advice how(or if) this is possible, if I can wor out how to do that for simple objects, I can work out how to handle all the extra gory details like semantic correctness an all that.

+4  A: 

Have you considered a parser such as Boost Spirit?

Fred Larson
+4  A: 

You description is a bit confusing (e.g. you mention "separated with colon", but I don't see any colons in the input). I'm assuming what you intend is that the items in square brackets are optional parameters, in parentheses are mandatory parameters, and in curly braces is the 'data'.

In that case, it appears that your grammar is something like this:

func: name optionalParams '(' paramList ')' '{' data '}'

paramList: param |
          paramlist ',' param

optionalParams:  // empty
              | '[' paramList ']'

name: WORD
param: WORD
data: WORD

This is a simple enough grammar that Spirit will probably work quite nicely with it. Spirit tends to lead to really long compilation times for larger grammars, but this grammar is small enough that the compilation time should be quite reasonable.

The obvious alternative would be to write a descent parser instead (like a recursive descent parser, but in this case recursion won't be needed). In this case, you'd basically write a function for each level of the grammar, have it read appropriate input, and return a structure (e.g. a vector) holding the data it read. For example, the optionalParams is probably the most difficult to parse (simply because it is optional):

typedef std::string param;

std::vector<param> read_optional_params(std::istream &in) { 
    std::vector<param> ret;

    char ch = in.peek();
    if (ch == '[' ) {
        in >> ch;
        param temp;
        while (in >> temp && temp != "]") 
            ret.push_back(temp);
            if ((ch=in.peek) == ',')
                in >> ch;
    }
    return ret;    
}

At the top level, you'd have something like:

function read_func(std::istream &in) { 
    std::string name = read_name(in);
    std::vector<param> optional_params = read_optional_params(in);
    std::vector<param> mandatory_params = read_mandatory_params(in);
    std::string data = read_data(in);

    if (in.fail()) {
        // malformed input
    }

    function func = function(name);
    for (int i=0; i<optional_params.size(); i++)
        func.addOptionalParam(optional_params[i]);
    for (int i=0; i<mandatory_params.size(); i++)
        func.addMandatoryParam(mandatoryParams[i]);
    func.setData(data);
    return func;
}
Jerry Coffin
+1 For considering a grammar and parsing it.
Thomas Matthews
A: 

Wow, simply wow...

I never knew that this would be done in other therms than reading from a file, and coming straight from marginal knowledge of Boost I didn't know there was something like boost::spirit. However now getting through the documentation, I find it very hard to follow, most of the time it's like "use real_p" and there's no info about namespaces, so it's quite confusing to where to find given item... However seeing what it can accomplish I am truly amazed.

I'd like to extend information about grammar (however I'm no closer to decipher how to exactly do it in code terms):

name: can be any string that does not start with digit, doesn't have any characters others than lower-case letters, digits and '_'
param: can be integer, double or a string containing only 'a-z' characters
data: whatever is between curly braces (even if possible, line breaks), so {{{{{} should yield '{{{{' and {}{}{}{} should fail (grammatically speaking)
';': should act as separator, and a definite end, so it's a must-have

Other things are comments as I can't see in examples how to implement them.

Thanks for the help, I like to see where it's going.

Johnny_Bit