views:

39

answers:

1

I am trying to write a shell language parser in Boost.Spirit. However, I am unclear about some basic issues regarding semantics of rules.

Looking at the documentation, there are members r.alias() and r.copy() of rule. IIUC, these members should return a reference to the rule and a copy of the rule's contents, respectively. However, it is not clearly specified what happens when I just use the rule in a definition of another rule. From my experiments, I found mutually recursive rules can be defined by:

rule<Iter> r1, r2;
r1 = ... >> r2 >> ...;
r2 = ... >> r1 >> ...;

which suggests the rules are taken by reference inside parser expressions. The problem is, what does it do when the variable goes out of scope, eg:

rule<Iter> r1;
{ 
  rule<Iter> r2;
  r1 = ... >> r2 >> ...;
  r2 = ... >> r1 >> ...;
}
... // use r1

On the same note, would assigning to a rule from a parsing expression containing a rvalue of type rule work (r.copy() would be a rvalue of type rule too, isn't it)? eg.

rule<Iter> f() { return char_('a') << char_('b'); }
rule<Iter> r1 = ... << f();

Can anybody enlighten me on the detailed semantics of rule's copies and references, and possibly correct any misconceptions in this post?

+2  A: 

The answer depends on what version of Spirit you're referring to.


Spirit.Classic (the former Spirit V1.x) implements special copy semantics for rules. The documentation says:

When a rule is referenced anywhere in the right hand side of an EBNF expression, the rule is held by the expression by reference. It is the responsibility of the client to ensure that the referenced rule stays in scope and does not get destructed while it is being referenced.

The assignment operator essentially references the rhs rule without creating a deep copy as well. This was done to allow:

rule<> r1, r2;
r1 = ...;
r2 = r1;

But this turned out to be highly confusion as it prevented handling rules the same way as 'normal' objects.

For that reason there was the member function rule::copy(), allowing to make explicit deep copies of a rule (for instance to store them in an STL container).

At the same time this:

r2 = r1.copy();

is plain wrong. r2 would refer to the (destructed) temporary copy of r1 returned from the function copy().


In Spirit.Qi (i.e. Spirit V2.x) the behaviour is partially changed. rules are now behaving as expected when handled outside of parsers. You can store them normally in containers (the assignment operator exposes the expected behavior). But beware, that inside a parser expression rules are still held by reference, which still allows to refer to a rule the same way as before:

rule<> r1, r2;
r1 = ... >> r2 >> ...;
r2 = ... >> r1 >> ...;

Sometimes it's necessary to make a deep copy of a rule, so there is still the member functon copy.

The changed copy semantics have another side effect. Constructs like:

r1 = r2;

are now creating a (deep) copy of r2, which might not be what you expect, especially if r2 will get its rhs assigned only after being 'assigned' to r1. For that reason there is the new member function alias enabling reference semantics for this corner case:

r1 = r2.alias();

In any case, in both versions of Spirit you will end up with dangling references if part of the rules referenced from a parser expression go out of scope.

BTW, neither Spirit version implements a function rule::ref().

hkaiser
Thanks for this answer. I just have a follow up question: Is it anyhow possible to use rvalues (temporaries) of parser expressions of some type in parser expression to allow statements like 'r1 = r1 | string("abc")' or generating rules in a function?
jpalecek
While the expression 'r1 = r1 | string("abc")' is theoretically possible it is a left recursion, which will result in an infinite recursion as Spirit generates recursive descent parsers. But the expression 'r1 = string("abc") | r1' will work as expected. You can generate a rule in a function if you make sure that it does not refer to any other rule, which went out of scope. In addition, in Spirit.Classic you need to return r.copy() from the function.
hkaiser
@hkaiser: 'r1 = string("abc") | r1' is left-recursion too :) But what I wanted to do, is make r1 match what r1 matched earlier and "abc". BTW how can I generate a rule in a function? This doesn't work for me: http://pastebin.org/482764
jpalecek
Well, I still believe 'r1 = string("abc") | r1' is right recursion. Try it, it will work with Spirit, while the other version doesn't work. Related to your other question, this: http://pastebin.org/483007 does what you want. Your version of the code doesn't work as the rule returned from the function is a temporary which is then held by reference in the parser expression, which is dangling. My version first stores a copy of the rule returned from the function, which is ok to be held by reference.
hkaiser