views:

294

answers:

1

Does anyone know of a Javascript lexical analyzer or tokenizer (preferably in Python?)

Basically, given an arbitrary Javascript file, I want to grab the tokens.

e.g.

foo = 1

becomes something like:

  1. variable name : "foo"
  2. whitespace
  3. operator : equals
  4. whitespace
  5. integer : 1
+1  A: 

http://code.google.com/p/pynarcissus/ has one.

Also I made one but it doesn't support automatic semicolon insertion so it is pretty useless for javascript that you have no control over (as almost all real life javascript programs lack at least one semicolon) :) Here is mine:

http://bitbucket.org/santagada/jaspyon/src/tip/jaspyon/

the grammar is in jsgrammar.txt, it is parsed by the PyPy parsing lib (which you will have to download and extract from the pypy source) and it build a parse tree which I walk on astbuilder.py

But if you don't have licensing problems I would go with pynarcissus. heres a direct link to look at the code (ported from narcissus):

http://code.google.com/p/pynarcissus/source/browse/trunk/jsparser.py

Leonardo Santagada