views:

34

answers:

1

I'm trying to make a parser for a made-up programming language. I'm now at the part of the exercise where we're required to make sure the parser's output is a conversion in C of the input.

So things like...

STARTMAIN a=b+2; return a ENDMAIN

...must become...

int main () { a=b+2; return a; }

So far so good, almost. The exercise also requires that in the same time, as we convert, we have to add proper indentation and (as I had to learn the hard way last year) newlines.

The obvious part is that each time a { opens, you increase a counter and then add the appropriate tabs on each new line. However, closing brackets ('}') are a different story as you can't detect them before hand, and once you've parsed them, you can't just put them a tab to the left by removing the last tab printed.

Is there a solution to this, and/or a consistent way of checking and adding indentation?

+2  A: 

Well, you've now discovered one reason why people do not always bother to format generated output neatly; it is relatively hard to do so.

Indeed, one way to deal with the problem is to provide an official formatter for the language. Google's Go programming language comes with the 'gofmt' program to encourage the official format. C does not have such a standard, hence the religious wars over the placement of braces, but it does have programs such as indent which can in fact format the code neatly for you.

The trick is not to output anything on a line until you know how many tabs to output. So, on a line with a close brace, you decrement the indent counter (making sure it never goes negative) and only then do you output the leading tabs and the following brace.

Note that some parts of C require a semi-colon (or comma) after the close brace (think initializers and structure definitions); others do not (think statement blocks).

Jonathan Leffler
This is all part of an exercise, so there's not any strict rules regarding the C conversion. I know I have to find a way to make sure I don't print anything before the closing brace, but I can't find a good combination to do BOTH things we were asked to, add proper indentation AND newlines.To add proper newlines, I have "add newline and tabs" code as part of each semicolon parsed, but that means that those tabs are printed prematurely if a closing brace is encountered next. One solution would be maybe to make the closing brace get another newline, and print it there.
Lefteris Aslanoglou
@Leftos: One interpretation of 'the trouble' is that you've mixed two operations that shouldn't be mixed - so you run into trouble. Outputting a newline does not mean you can also output the next lot of white space; you have to wait until you know what is going on the line before you output that. At the end of the file, you might output a line with blanks - not good. After a set of declarations, where you want an empty line, you might output a line with tabs on it instead. Not good. Output the newline (end of previous line) separately from the the leading white space.
Jonathan Leffler
@Leftos: also, there are places where you do not want a newline after a semi-colon - notably in a `for` loop. So the rule 'print a newline after a semi-colon' is too simplistic.
Jonathan Leffler
Okay, thanks. I've avoided adding newlines as I couldn't handle both new lines and indentation properly in Bison. Now indentation works properly. Off to my other two questions posted...Thanks for your help.
Lefteris Aslanoglou