ansaurus

Question

Answer 1

+1 A:

My approach would be to make your lexer generate indent/outdent tokens. Store the current indentation level and match a pattern like \n *. Count the number of spaces and if it is different to the current indentation level, emit an indent/outdent token.

Similarly, count tabs at start-of-line. Inserting a rule that throws an error up on a pattern of \n[ \t]* should stop people mixing tabs and spaces.

Jack Kelly 2010-09-10 03:20:02

Answer 2

+2 A:

ANTLR can surely be used for this. However, if you're new to ANTLR or parser-generators in general, I don't think I can give a short explanation of how to do this exactly. I recommend you try some simple things with ANTLR and browse through The Definitive ANTLR Reference. It even has a paragraph about this type of problem which is similar to parsing Python code. See Chapter 4.3 Rules, paragraph Emitting More Than One Token per Lexer Rule for details.

Bart Kiers 2010-09-10 08:09:09

ansaurus

tags:

views:

answers:

Compiling/parsing meaningful whitespace

related questions