views:

515

answers:

2

My situation: I'm new to Spirit, I have to use VC6 and am thus using Spirit 1.6.4.

I have a line that looks like this:

//The Description;DESCRIPTION;;

I want to put the text DESCRIPTION in a string if the line starts with //The Description;.

I have something that works but looks not that elegant to me:

vector<char> vDescription; // std::string doesn't work due to missing ::clear() in VC6's STL implementation
if(parse(chars,
    // Begin grammar
    (
       as_lower_d["//the description;"]
    >> (+~ch_p(';'))[assign(vDescription)]
    ),
    // End grammar
    space_p).hit)
{
    const string desc(vDescription.begin(), vDescription.end());
}

I would much more like to assign all printable characters up to the next ';' but the following won't work because parse(...).hit == false

parse(chars,
        // Begin grammar
        (
           as_lower_d["//the description;"]
        >> (+print_p)[assign(vDescription)]
        >> ';'
        ),
        // End grammar
        space_p).hit)

How do I make it hit?

+3  A: 

You're not getting a hit because ';' is matched by print_p. Try this:

parse(chars,
    // Begin grammar
    (
       as_lower_d["//the description;"]
    >> (+(print_p-';'))[assign(vDescription)]
    >> ';'
    ),
    // End grammar
    space_p).hit)
Fred Larson
Thanks, I will try this tomorrow.It looks like there is a fundamental misunderstanding on my side then. I assumed the parser would try to match things if possible and not be so lazy...Do you know what the term for this behaviour is?
mxp
I think the term is "greedy". See http://www.boost.org/doc/libs/1_35_0/libs/spirit/doc/faq.html#greedy_rd
Fred Larson
+3  A: 

You might try using confix_p:

confix_p(as_lower_d["//the description;"],
         (+print_p)[assign(vDescription)],
         ch_p(';')
        )

It should be equivalent to Fred's response.

The reason your code fails is because print_p is greedy. The +print_p parser will consume characters until it encounters the end of the input or a non-printable character. Semicolon is printable, so print_p claims it. Your input gets exhausted, the variable is assigned, and the match fails — there's nothing left for the last semicolon of your parser to match.

Fred's answer constructs a new parser, (print_p - ';'), which matches everything print_p does, except for semicolons. "Match everything except X, and then match X" is a common pattern, so confix_p is provided as a shortcut for constructing that kind of parser. The documentation suggests using it for parsing C- or Pascal-style comments, but that's not required.

For your code to work, Spirit would need to recognize that the greedy print_p matched too much and then backtrack to allow matching less. But although Spirit will backtrack, it won't backtrack to the "middle" of what a sub-parser would otherwise greedily match. It will backtrack to the next "choice point," but your grammar doesn't have any. See Exhaustive backtracking and greedy RD in the Spirit documentation.

Rob Kennedy
Thanks, this works, too and looks even better.I have a small correction though: To get only the text _in_between_ confix_p's opening and closing, the [assign()] action has to be put behind the print_p instead of behind confix_p().
mxp
Ah, you're right. I skimmed over the documentation too fast. It looks wrong at first, but the parser fixes it to do the right thing.
Rob Kennedy