The various AMPER, BACKQUOTE etc values correspond to the token number of the appropriate symbol for python tokens / operators. ie AMPER = & (ampersand), AMPEREQUAL = "&=".
However, you don't actually have to care about these. They're used by the internal C tokeniser, but the python wrapper simplifies the output, translating all operator symbols to the OP
token. You can translate the symbolic token ids (the first value in each token tuple) to the symbolic name using the token module's tok_name dictionary. For example:
>>> import tokenize, token
>>> s = "{'test':'123','hehe':['hooray',0x10]}"
>>> for t in tokenize.generate_tokens(iter([s]).next):
print token.tok_name[t[0]],
OP STRING OP STRING OP STRING OP OP STRING OP NUMBER OP OP ENDMARKER
As a quick debug statement to describe the tokens a bit better, you could also use tokenize.printtoken. This is undocumented, and looks like it isn't present in python3, so don't rely on it for production code, but as a quick peek at what the tokens mean, you may find it useful:
>>> for t in tokenize.generate_tokens(iter([s]).next):
tokenize.printtoken(*t)
1,0-1,1: OP '{'
1,1-1,7: STRING "'test'"
1,7-1,8: OP ':'
1,8-1,13: STRING "'123'"
1,13-1,14: OP ','
1,14-1,20: STRING "'hehe'"
1,20-1,21: OP ':'
1,21-1,22: OP '['
1,22-1,30: STRING "'hooray'"
1,30-1,31: OP ','
1,31-1,35: NUMBER '0x10'
1,35-1,36: OP ']'
1,36-1,37: OP '}'
2,0-2,0: ENDMARKER ''
The various values in the tuple you get back for each token are, in order:
- token Id (corresponds to the type, eg STRING, OP, NAME etc)
- The string - the actual token text for this token, eg "&" or "'a string'"
- The start (line, column) in your input
- The end (line, column) in your input
- The full text of the line the token is on.