ansaurus

Question

How to handle a tokenize error with unterminated multiline comments (python 2.6)

Answer 1

+1 A:

How you handle tokenize errors depends entirely on why you are tokenizing. You code gives you all the valid tokens up until the beginning of the bad string literal. If that token stream is useful to you, then use it.

You have a few options about what to do with the error:

You could ignore it and have an incomplete token stream.
You could buffer all the tokens and only use the token stream if no error occurred.
You could process the tokens, but abort the higher-level processing if an error occurred.

As to whether that error can happen with anything other than an incomplete docstring, yes. Remember that docstrings are just string literals. Any unterminated multi-line string literal will give you the same error. Similar errors could happen for other lexical errors in the code.

For example, here are other values of s that produce errors (at least with Python 2.5):

s = ")"  # EOF in multi-line statement
s = "("  # EOF in multi-line statement
s = "]"  # EOF in multi-line statement
s = "["  # EOF in multi-line statement
s = "}"  # EOF in multi-line statement
s = "{"  # EOF in multi-line statement

Oddly, other nonsensical inputs produce ERRORTOKEN values instead:

s = "$"
s = "'"

Ned Batchelder 2009-10-28 22:42:21

Thanks! This was the type of information I was looking for. I was hoping there was a way to intercept (and ignore) these tokenize errors to make the tokenizer not stop parsing the others so I could (in the end) exclude non-valid 'blocks' based on the indent/dedent tokens. But it's probable -and reasonable- that the generator is in a too inconsistent/inpredictable state to 'continue' tokenizing...

ChristopheD 2009-10-28 23:05:22

Exactly. There's no reasonable next token after a bogus string, especially if it's read the entire rest of the file before determining there's an error.

Ned Batchelder 2009-10-28 23:10:22

ansaurus

tags:

views:

answers:

How to handle a tokenize error with unterminated multiline comments (python 2.6)

related questions