How would you parse indentation (python style)?

Also you can track somewhere in lexer how many ident items are preceding first line and pass it to parser. Most interesting part would be trying to pass it to parser correctly :) If your parser uses lookahead (here I mean parser may query for variable number of tokens before it really going to match even one) then trying to pass it through one global variable seems to be very bad idea (because lexer can slip on next line and change value of indent counter while parser is still trying to parse previous line). Also globals are evil in many other cases ;) Marking first line 'real' token in someway with indent counter is more reasonable. I can't give you exact example (I don't even know what parser and lexer generators are you going to use if any...) but something like storing data on first line tokens (it could be non comfortable if you can't easily get such token from parser) or saving custom data (map that links tokens to indent, array where every line in source code as index and indent value as element value) seems to be enough. One downside of this approach is additional complexity to parser that will need to distinguish between ident values and change its behavior based on it. Something like LOOKAHEAD({ yourConditionInJava }) for JavaCC may work here but it is NOT a very good idea. A lot of additional tokens in your approach seems to be less evil thing to use :)

As another alternative I would suggest is to mix this two approaches. You could generate additional tokens only when indent counter changes its value on next line. It is like artificial BEGIN and END token. In this way you may lower number of 'artificial' tokens in your stream fed into parser from lexer. Only your parser grammar should be adjusted to understand additional tokens...

I didn't tried this (have no real experience with such languages parsing), just sharing my thoughts about possible solutions. Checking already built parsers for this kinds of languages could be of great value for you. Open source is your friend ;)

ansaurus

tags:

views:

answers:

How would you parse indentation (python style)?

related questions