views:

2029

answers:

5

In several questions I've seen recommendations for the Spirit parser-generator framework from boost.org, but then in the comments there is grumbling from people using Spirit who are not happy. Will those people please stand forth and explain to the rest of us what are the drawbacks or downsides to using Spirit?

+21  A: 

It is a quite cool idea, and I liked it; it was especially useful to really learn how to use C++ templates.

But their documentation recommends the usage of spirit for small to medium-size parsers. A parser for a full language would take ages to compile. I will list three reasons.

  • Scannerless parsing. While it's quite simpler, when backtracking is required it may slow down the parser. It's optional though - a lexer might be integrated, see the C preprocessor built with Spirit. A grammar of ~300 lines (including both .h and .cpp files) compiles (unoptimized) to a file of 6M with GCC. Inlining and maximum optimizations gets that down to ~1,7M.

  • Slow parsing - there is no static checking of the grammar, neither to hint about excessive lookahead required, nor to verify basic errors, such as for instance usage of left recursion (which leads to infinite recursion in recursive-descent parsers LL grammars). Left recursion is not a really hard bug to track down, though, but excessive lookahead might cause exponential parsing times.

  • Heavy template usage - while this has certain advantages, this impacts compilation times and code size. Additionally, the grammar definition must normally be visible to all other users, impacting even more compilation times. I've been able to move grammars to .cpp files by adding explicit template instantiations with the right parameters, but it was not easy.

Blaisorblade
thanks for your insightful response.
Norman Ramsey
You're always welcome!
Blaisorblade
+1 for compilation times and code-bloat... I've used spirit once for a small project (parsing config-files) and later decided to remove it for exactly these reasons.
Nils Pipenbrinck
+1 Thanks for changing my mind. I chose YARD instead.
Viet
+9  A: 

Here is what I don't like about it:

  • the documentation is limited. There is one big web page where "everything" is explained, but the current explanations lack in details.

  • poor AST generation. ASTs are poorly explained and, even after hitting your head against the wall to understand how the AST modifiers work, it's difficult to obtain an easy to manipulate AST (i.e. one that maps well to the problem domain)

  • It increases compilation times enormously, even for "medium"-sized grammars

  • Syntax is too heavyweight. It is a fact of life that in C/C++ you must duplicate code (i.e. between declaration and definition). However, it seems that in boost::spirit, when you declare a grammar<>, you must repeat some things 3 times :D (when you want ASTs, which is what I want :D)

Other than this, I think they did a pretty good job with the parser, given the limitations of C++. But I think they should improve it more. The history page describes that there was a "dynamic" spirit before the current "static" spirit; I'm wondering how much faster and how much better syntax it had.

Not sure, but I don't think a dynamic spirit could be much better than the current one. It could have been much slower for the extra virtual calls it had; IIRC, nowadays there are virtual calls just when entering a rule<...> parser, in a dynamic spirit each call between a composite parser and one of its component would have been virtual.
Blaisorblade
I forgot to agree on your comments on documentation and AST generation. When I used Spirit, some details were not defined by docs, and the code seemed to give incoherent and buggy results about those, maybe because the developers didn't specify them.
Blaisorblade
+3  A: 

I would say the biggest problem is the lack of any diagnosis or other help for grammar problems. If your grammar is ambiguous, the parser might not parse what you expect it to, and there's no good way of noticing that.

Chris Dodd
+7  A: 

In boost 1.41 a new version of Spirit is being released, and it beats of pants off of spirit::classic:

After a long time in beta (more than 2 years with Spirit 2.0), Spirit 2.1 will finally be released with the upcoming Boost 1.41 release. The code is very stable now and is ready for production code. We are working hard on finishing the documentation in time for Boost 1.41. You can peek at the current state of the documentation here. Currently, you can find the code and documentation in the Boost SVN trunk. If you have a new project involving Spirit, we highly recommend starting with Spirit 2.1 now. Allow me to quote OvermindDL's post from the Spirit mailing list:

I may start to sound like a bot with how often I say this, but Spirit.Classic is ancient, you should switch to Spirit2.1, it can do everything you did above a GREAT deal easier, a lot less code, and it executes faster. For example, Spirit2.1 can build your entire AST inline, no weird overriding, no need to build things up afterwards, etc..., all as one nice and fast step. You really need to update. See the other posts from the past day for links to docs and such for Spirit2.1. Spirit2.1 is currently in Boost Trunk, but will be formally released with Boost 1.41, but is otherwise complete.

michalmocny
+1  A: 

For me, the biggest problem is that expressions in Spirit, as seen by compiler or debugger, are rather long (I copied below a part of one expression in Spirit Classic). These expressions scare me. When I work on a program that uses Spirit, I'm afraid to use valgrind or to print backtrace in gdb.

boost::spirit::classic::parser_result<boost::spirit::classic::action<boost::spirit::classic::sequence<boost::spirit::classic::action<boost::spirit::classic::action<optional_suffix_parser<char const*>, boost::spirit::classic::ref_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::clear_action> >, boost::spirit::classic::ref_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::clear_action> >, boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::action<boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::chlit<char>, boost::spirit::classic::chlit<char> >, boost::spirit::classic::positive<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alnum_parser, boost::spirit::classic::chlit<char> >, boost::spirit::classic::chlit<char> > > > >, boost::spirit::classic::ref_value_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action<boost::spirit::classic::rule<boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor<std::vector<std::string, std::allocator<std::string> >, std::string, boost::spirit::classic::push_back_action> > >, boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::action<boost::spirit::classic::uint_parser<unsigned int, 10, 1u, -1>, boost::spirit::classic::ref_value_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::push_back_action> > > > >, boost::spirit::classic::kleene_star<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::action<boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::chlit<char>, boost::spirit::classic::chlit<char> >, boost::spirit::classic::positive<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alnum_parser, boost::spirit::classic::chlit<char> >, boost::spirit::classic::chlit<char> > > > >, boost::spirit::classic::ref_value_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action<boost::spirit::classic::rule<boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor<std::vector<std::string, std::allocator<std::string> >, std::string, boost::spirit::classic::push_back_action> > >, boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::action<boost::spirit::classic::uint_parser<unsigned int, 10, 1u, -1>, boost::spirit::classic::ref_value_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::push_back_action> > > > > > > > >, void ()(char const, char const*)>, boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> > >::type boost::spirit::classic::action<boost::spirit::classic::sequence<boost::spirit::classic::action<boost::spirit::classic::action<

marcin