views:

22

answers:

1

I'm currently in a class on systems software development. We are writing the two-pass assembler for the assembly language of a fictional machine. We've implemented the tokenizer, and all of the classes that we need to abstractedly represent this program - all that is left (besides implementing the code generator in a later phase) is to parse the tokens. Here is where I'm having a major issue. I'm choosing to implement this as a recursive descent parser, since that's the only technique I currently have experience with...but we are not allowed to stop assembly on syntax errors. For instance, if the user gives a load word instruction with invalid syntax, we are to replace it with a NOP. If the user gives a bad label, we are to simply ignore it. If the user places unknown characters in a line, we discard them.

On the one hand, it sounds easy - however, implementing this causes me to break (what I understand to be) one of the important rules of a recursive descent parser. Each of my functions pulls multiple tokens before calling another function, since I need to account for all of the possible fixable syntax errors. Given that I can't stop assembly, and I must have enough information about my current context to intelligently determine what the user was intending to do, I have to handle a lot within one function.

This turns the program from a true recursive descent parser into more of a semi-finite-state-machine. I feel like I'm doing this badly, but I'm not sure how else to implement this. Does anyone have any suggestions/ideas?

BTW - I'm not allowed to use tools like ANTLR, or any other parser generator.

Thanks.

+1  A: 

My suggestion would be don't try. Poor syntax error recovery is inherent in recursive descent parsers. If you are not allowed to use a parser generator, then decent syntax error recovery is probably beyond the scope of your homework. (Check with your instructor ...)

Stephen C
Haha, I like your idea...unfortunately, that's not an option. We are required to do syntax recovery. Given that, any thoughts? Is there a parsing technique besides recursive descent parsing that would lend itself better to this sort of thing?
Ryan