views:

49

answers:

1

Hi

Is there any existing tool to strip all the action code from bison grammar files, leaving only the {} around it?

+1  A: 

To the best of my knowledge, no.

As you surely know, writing your own tool is doable, but difficult. For example, the { and } characters can appear as character constants or in strings. (So can the : and ; characters, of course.)

If you have specific files you want to strip the actions from, and you can rely on your own environment and constraints (i.e. you don't need a solution for the general case), there may be a relatively simple way to do it.

If you need a full general solution, what remains is to hack bison code. Not for the faint of heart, I admit. That said, much of bison is implemented or sketched out in bison.

In the bison sources, see scan-gram.l and parse-gram.y for a bison scanner/parser team. The token to look out for is BRACED_CODE.

Now, since what you need is basically to take a file and generate a near-exact copy of it, and you don't really need to understand it, you can probably do all your work in the lexer. You can use scan-gram.l as a basis for your work. A helpful modification may be to add another state (start condition) to describe if you're in the prologue/declaration section, versus the grammar rules. Everything but the grammar rules should be printed verbatim.

Comments, whitespace, directives, most punctuation, identifiers, numbers: just print these out verbatim.

Characters and strings: these require their own states in the lexer because it's essential to find where they end. (Character literals may be longer than one keyboard character; think octal.) But given that they have their own states, print them out verbatim.

Code: like characters and strings, you need to figure out where it ends. This is a bit trickier, too, because it may contain strings and comments and whatnot. But once you find where it ends, you can exit the code state. Nothing in here needs to be printed (except for the braces, of course).

Good luck!

JXG
Yeah I've written my own dummy tool. The grammar in cause doesn't have any of those special cases mentioned by you. Tokenization was done simply by spliting with PCRE by \W. It works. Anyway accepted answer, especially because of "BRACED_CODE". Your solution is clearly the cleanest one (unless bison itself provides an API to get the AST of the grammar).
Flavius