views:

246

answers:

3

I'm currently in the middle of playing with a BNF grammar that I hope to be able to wrangle into a LL(1) form. However, I've just finished making changes and calculating the new FIRST and FOLLOW sets for the grammar by hand for the third time today and I'm getting tired of it. There has to be a better way!

Can someone suggest a tool that, given a grammar, will automatically calculate the first and follow sets for all of the non terminals?

+1  A: 

A year ago, we had a semester project at the university I attend, where our task was to create a programming language. As a group, we decided we wanted to be able to hand-write the parser from scratch, so we had to aim for an LL(1) grammar, since it would have been completely unrealistic to write a parser otherwise.

Of course, our starting point was far from being LL(1), so we too had to wrangle it into place. For that purpose, we used the kfgEdit tool from the AtoCC package. All you do is enter your rules, and then it can check if it's an LL(1) grammar at the click of a button.

A fair word of warning: The tool is a bit finicky about what it accepts. While you'd often use EBNF for the real grammar, so you can write ? and * and + to signal how many times that token must appear there, this is not supported. Grouping is also not supported. You may very well find that it takes a very long time to do this, and you will almost surely want to do some "rearranging" after you've reached LL(1) to make the grammar even close to being readable.

Of course, depending on the type of grammar you're dealing with, this may not be much of a problem for you. We had created a sort of Pascal/C hybrid, with a fairly restricted set of constructs (procedures, functions, only built-in primitive types and arrays of them, ifs, a single loop construct we'd come up with ourselves in place of the standard 3...), and it took me at least a week to wrangle it into an LL(1) grammar - probably 2, actually. Note that this is out of a total of about 4 months, so that was a lot of time spent there.

If you absolutely MUST have an LL(1) grammar, then you obviously will need to press on if you get into a situation like this, but if you're allowed to use parser generators like yacc/bison or SableCC then you will, in the long run, most likely find it a LOT easier to go down that route. That doesn't mean you SHOULD go down that route - I found that actually writing everything by hand provided some insight I probably wouldn't have gained otherwise - but it might be better for you to gain that insight in a different situation than your current.

tl;dr version: Use kfgEdit from the AtoCC package.

Michael Madsen
For kfgEdit will calculate first sets for all non-terminals, but only some follow sets based on where possible collisions may occur. That said, it was still invaluable for grammar wrangling. Thanks!
Zxaos
A: 

For recursive descent parsing, it would be worth looking at ANTLR. However, I'm not sure it provides an exact answer for your question - find the FIRST and FOLLOW sets for a given grammar.

Jonathan Leffler
I don't recall any real analysis features in ANTLR (or ANTLRWorks, for that matter). It can tell you if it finds a problem in generating the parser from your grammar, but IIRC, the messages you get aren't that helpful if you're still trying to massage the grammar to an LL(1) form.
Michael Madsen
A: 

The DMS Software Reengineering Toolkit has a parser generator that computes FIRST and FOLLOW sets; it will also let you inspect the L(AL)R state machine it generates.

However, if you have a legitimate context-free grammar, you don't have to "wrangle it" into LL shape; the DMS parser generator produces GLR parsers from any context-free grammar.

Ira Baxter