views:

156

answers:

3

Hello,

Please help

There are many tokens in module tokenize like STRING,BACKQUOTE,AMPEREQUAL etc.

    >>> import cStringIO
>>> import tokenize
>>> source = "{'test':'123','hehe':['hooray',0x10]}"
>>> src = cStringIO.StringIO(source).readline
>>> src = tokenize.generate_tokens(src)
>>> src
<generator object at 0x00BFBEE0>
>>> src.next()
(51, '{', (1, 0), (1, 1), "{'test':'123','hehe':['hooray',0x10]}")
>>> token = src.next()
>>> token
(3, "'test'", (1, 1), (1, 7), "{'test':'123','hehe':['hooray',0x10]}")
>>> token[0]
3
>>> tokenize.STRING
3
>>> tokenize.AMPER
19
>>> tokenize.AMPEREQUAL
42
>>> tokenize.AT
50
>>> tokenize.BACKQUOTE
25

This is what i experimented.But i was not able to find what they mean ?

From where i will understand this.I need an immediate solution.

+2  A: 

You will need to read python's code tokenizer.c to understand the detail. Just search the keyword you want to know. Should be not hard.

kcwu
+1  A: 

Python's lexical analysis (including tokens) is documented at http://docs.python.org/reference/lexical_analysis.html . As http://docs.python.org/library/token.html#module-token says, "Refer to the file Grammar/Grammar in the Python distribution for the definitions of the names in the context of the language grammar.".

Alex Martelli
A: 

The various AMPER, BACKQUOTE etc values correspond to the token number of the appropriate symbol for python tokens / operators. ie AMPER = & (ampersand), AMPEREQUAL = "&=".

However, you don't actually have to care about these. They're used by the internal C tokeniser, but the python wrapper simplifies the output, translating all operator symbols to the OP token. You can translate the symbolic token ids (the first value in each token tuple) to the symbolic name using the token module's tok_name dictionary. For example:

>>> import tokenize, token
>>> s = "{'test':'123','hehe':['hooray',0x10]}"
>>> for t in tokenize.generate_tokens(iter([s]).next):
        print token.tok_name[t[0]],

OP STRING OP STRING OP STRING OP OP STRING OP NUMBER OP OP ENDMARKER

As a quick debug statement to describe the tokens a bit better, you could also use tokenize.printtoken. This is undocumented, and looks like it isn't present in python3, so don't rely on it for production code, but as a quick peek at what the tokens mean, you may find it useful:

>>> for t in tokenize.generate_tokens(iter([s]).next):
        tokenize.printtoken(*t)

1,0-1,1:        OP      '{'
1,1-1,7:        STRING  "'test'"
1,7-1,8:        OP      ':'
1,8-1,13:       STRING  "'123'"
1,13-1,14:      OP      ','
1,14-1,20:      STRING  "'hehe'"
1,20-1,21:      OP      ':'
1,21-1,22:      OP      '['
1,22-1,30:      STRING  "'hooray'"
1,30-1,31:      OP      ','
1,31-1,35:      NUMBER  '0x10'
1,35-1,36:      OP      ']'
1,36-1,37:      OP      '}'
2,0-2,0:        ENDMARKER       ''

The various values in the tuple you get back for each token are, in order:

  1. token Id (corresponds to the type, eg STRING, OP, NAME etc)
  2. The string - the actual token text for this token, eg "&" or "'a string'"
  3. The start (line, column) in your input
  4. The end (line, column) in your input
  5. The full text of the line the token is on.
Brian