views:

503

answers:

1

good day.

i've been using boost spirit classic in the past and now i'm trying to stick to the newer one, boost spirit 2.x. could someone be so kind to point me in how to deal with keywords? say, i want to distinguish between "foo" and "int" where "foo" is identifier and "int" is just a keyword. i want to protect my grammar from incorrect parsing, say, "intfoo".

okay, i have

struct my_keywords : boost::spirit::qi::symbols<char, std::string> {
                my_keywords() {
                    add
                    ("void")
                    ("string")
                    ("float")
                    ("int")
                    ("bool")
                    //TODO: add others
                    ;
                }
            } keywords_table_;

and the ident rule declared as:

boost::spirit::qi::rule<Iterator, std::string(),  ascii::space_type> ident;
ident = raw[lexeme[((alpha | char_('_')) >> *(alnum | char_('_'))) - keywords_table_]];

and, say, some rule:

boost::spirit::qi::rule<Iterator, ident_decl_node(),  ascii::space_type> ident_decl;
ident_decl = ("void" | "float" | "string" | "bool") >> ident;

how to write it correctly, stating that "void", "float", etc are keywords? thanks in advance.

+2  A: 

Hmmm just declare your rule to be:

//the > operator say that your keyword MUST be followed by an ident
//instead of just may (if I understood spirit right the >> operator will
//make the parser consider other rules if it fail which might or not be
//what you want.
ident_decl = keyword_table_ > ident;

Expending on your exemple you should have something like this at the end:

struct my_keywords : boost::spirit::qi::symbols<char, int> {
                my_keywords() {
                    add
                    ("void", TYPE_VOID)
                    ("string", TYPE_STRING)
                    ("float", TYPE_FLOAT)
                    ("int", TYPE_INT)
                    ("bool", TYPE_BOOL)
                    //TODO: add others
                    ;
                }
            } keywords_table_;

//...

class ident_decl_node
{
   //this will  enable fusion_adapt_struct to access your private members
   template < typename, int>
   friend struct boost::fusion::extension::struct_member;

   int type;
   std::string ident;
};

BOOST_FUSION_ADAPT_STRUCT(
   ident_decl_node,
   (int, type)
   (std::string, ident)
)

//...

struct MyErrorHandler
{
    template <typename, typename, typename, typename>
    struct result { typedef void type; };

    template <typename Iterator>
    void operator()(Iterator first, Iterator last, Iterator error_pos, std::string const& what) const
    {
     using boost::phoenix::construct;

     std::string error_msg = "Error! Expecting ";
     error_msg += what; // what failed?
     error_msg += " here: \"";
     error_msg += std::string(error_pos, last);   // iterators to error-pos, end
     error_msg += "\"";

        //put a breakpoint here if you don't have std::cout for the console or change
        //this line for something else.
     std::cout << error_msg;
    }
};

//...

using boost::spirit::qi::grammar;
using boost::spirit::ascii::space_type;

typedef std::vector<boost::variant<ident_decl_node, some_other_node> ScriptNodes;

template <typename Iterator>
struct NodeGrammar: public grammar<Iterator, ScriptNodes(), space_type>
{
    using boost::spirit::arg_names; //edit1

    NodeGrammar: NodeGrammar::base_type(start)
    {
      //I had problem if I didn't add the eps rule (which do nothing) so you might
      //want to leave it
      start %= ident_decl | some_other_node_decl >> eps;

      ident_decl %= keyword_table > ident;
      //I'm not sure if the %= operator will work correctly on this, you might have to do
      //the push_back manually but I think it should work
      ident %= raw[lexeme[((alpha | char_('_')) >> *(alnum | char_('_'))) - keywords_table_]];

      on_error<fail>(start, error_handler(_1, _2, _3, _4)); //edit1
    }

    my_keywords keyword_table_;

    boost::spirit::qi::rule<Iterator, ScriptNodes(),  ascii::space_type> start;
    boost::spirit::qi::rule<Iterator, ident_decl_node(),  ascii::space_type> ident_decl;
    boost::spirit::qi::rule<Iterator, some_other_node(),  ascii::space_type> ident_decl;
    boost::spirit::qi::rule<Iterator, std::string(),  ascii::space_type> ident;

    boost::phoenix::function<MyErrorHandler> error_handler; //edit1
};

Also I don't know which version you use but I used the one in boost 1.40 and it seems there is a bug when using operator %= followed by only one argument (the parser would not parse correctly this rule). Ex:

ident_decl %= ident;

do this instead

ident_decl %= ident > eps;

which should be equivalent.

Hope this helped.

n1ck
thanks for try, but it doesn't work for me. i have many errors, such as:error: no matching function for call to 'boost::fusion::vector_data1<boost::fusion::vector1<std::vector<boost::variant<Freefoil::Private::ident_decl_node, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, etc etc etc etc... i use boost 1.39 version.
varnie
Can you tell me on what line the error seems to occur? You might have to scroll a lot of text but eventually you should find the line where the error occured in your code. I can try to make you a minimum compilable (this one is untested) exemple with boost 1.39 when I get some time. Keep me tuned to your progress.
n1ck
varnie
wow! i've solved it! your note about adding ">> eps" to the end of rule was a cause of problem! in any case, the rule:start %= ident_decl;compiles fine. good, now i have a toy compilable grammar. thank you!
varnie
Hmm weird, >> eps should not do anything. I had problem at runtime if I didn't had >> eps when working with operator %=. Watch it if your program assert at runtime saying something about intrusive_ptr ptr == NULL. But you don't use the same version as I so I don't know. Cheers for solving your problem varnie!
n1ck
i am very sorry for annoying you, but it seems i am still unable to parse such inputs as "void foo", "int bar". on this moment the grammar presented above parses only wrong inputs (i.e. "voidfoo", "intbar"). maybe you'll give me any ideas to check? thank you once again;)
varnie
Don't worry you're not annoying me. As for your question try replacing ident_decl %= keyword_table > ident; with ident_decl %= keyword_table > boost::spirit::ascii::space > ident; which will require a space type (' ', '\n' and such) between the keyword and the identifier. You can also do char_(' ') if you want to restrict only to pure spaces.
n1ck
now i get exception for every input (correct and incorrect one):"terminate called after throwing an instance of 'boost::spirit::qi::expectation_failure<__gnu_cxx::__normal_iterator<char const*, std::string> >'"
varnie
this is related to the > operator which expect the next term to be there. You should try adding error handling to your grammar to see where it fail. Let me edit my answer to add it.
n1ck
Hmm I don't know but I wouldn't bet on that. Can you try scrolling all the text and see where it reference your code. There is probably like >50 lines of error code but somewhere must be a (normally 2 or 3) reference of where in your code the error is. Try to find the most specific place (usually I get one at the first bracket just under struct NodeGrammar: public grammar<Iterator, ScriptNodes(), space_type> but this is not where the error is, look the couple lines above (in the error message) and you should find where the exact error is)
n1ck
Woah what happened, didn't you had put new comments just a few seconds ago?
n1ck
oop. yes, you're right;) i wanted just to make note that there is a light changes with the lastest boost spirit from the svn trunk. the proper way do declare errors-handling-function is as follow:
varnie
struct MyErrorHandler{ template <typename, typename, typename, typename>struct result { typedef void type;}template <typename Iterator>void operator(){Iterator first, Iterator last, Iterator error_pos, qi::info const
varnie
there is still a problem with whitespaces:if i have ident_decl declared as "ident_decl %= keyword_table > ident;" then i cannot catch wrong inputs such as "voidfoo".if i have that rule declared as "ident_decl %= keyword_table >> ascii::space > ident;" then our errors-handler executes every time after the *first* entered word, for example:void fooerror! expecting <space> here: "foo"voidfooerror! expecting <space> here: "foo"
varnie
ahh yes I think I know why, we use the space skipper in our toy grammar, that mean that each spaces will be skipped. To disabled it use lexeme[]. That said I think that it won't work if you put it only on the ascii::space (bug or by-design I don't know) so you should put it on the full rule "ident_decl %= lexeme[keyword_table >> ascii::space > ident];". You probably don't wan't spaces to be skipped in your keywords and ident anyway.
n1ck
varnie
i don't think so but if this is a problem you could probably wrap it in some wrapper rule: wrapper_rule %= lexeme[ident_decl]; ident_decl %= /*your normal rule*/
n1ck
i think i have to wait until there will be some suitable solutions. maybe spirit will have some directives for dealing with keywords. on this moment all these attemps above looks like a nasty hacks ;) it is abnormal to explicitly attach "ascii::space" everywhere where some "keyword string" pattern needed IMHO. our rules already have skipper declared so why we must to add these skippers explicitly once again? dunno
varnie
Well I think you don't have a choice here because it is part of your rule: you want your identifier to require a space between them and keywords so you need to tell spirit that. The skipper are there for when you don't care ex: parsing i++ or i ++ or i \n ++ is the same. the lexeme[] directive is expressively there to prohibit space skipping if your rule require one. I don't know of any shortcut for this. Like I said this is part of the rule. Maybe something exist, you should ask on the spirit mailing list to see (see link in Eric's comments).
n1ck
during some researches and digging through boost-spirit examples and docs i've came across a simple solution (at least, it works for my needs). here it is: ident_decl %= lexeme[keywords_table_ >> !(alnum | '_')] > ident;
varnie
i think, the thread can be closed now;)
varnie