How to tokenize Perl source code? | ansaurus

tags:

views:

88

answers:

2

+4 Q:

How to tokenize Perl source code?

I have some reasonable (not obfuscated) Perl source files, and I need a tokenizer, which will split it to tokens, and return the token type of each of them, e.g. for the script

print "Hello, World!\n";

it would return something like this:

keyword 5 bytes
whitespace 1 byte
double-quoted-string 17 bytes
semicolon 1 byte
whitespace 1 byte

Which is the best library (preferably written in Perl) for this? It has to be reasonably correct, i.e. it should be able to parse syntactic constructs like qq{{\}}}, but it doesn't have to know about special parsers like Lingua::Romana::Perligata. I know that parsing Perl is Turing-complete, and only Perl itself can do it right, but I don't need absolute correctness: the tokenizer can fail or be incompatible or assume some default in some very rare corner cases, but it should work correctly most of the time. It must be better than the syntax highlighting built into an average text editor.

FYI I tried the PerlLexer in pygments, which works reasonable for most constructs, except that it cannot find the 2nd print keyword in this one:

print length(<<"END"); print "\n";
String
END

+16 A:

daxim 2010-08-19 09:18:25

+4 A:

use PPI;

Yes, only perl can parse Perl, however PPI is the 95% correct solution.

szbalint 2010-08-19 09:19:27

related questions

Regex to replace Boolean with bool

How do I lock a file in Perl?

How do you use XML::Parser with Style => 'Objects'

Parsing XML Elements & Attributes with Perl

Regex to match all HTML tags except <p> and </p>

Best way to extract data from a FileMaker Pro database in a script?

What's the fastest way to determine a full URL from a relative URL (given a base URL).

Why can't I connect to my CAS server with Perl's AuthCAS?

Why can't I fetch wikipedia pages with LWP::Simple?

How do I perform a Perl substitution on a string while keeping the original?

How do I read in the contents of a directory in Perl?

How can Perl's system() print the command that it's running?

How can I test STDIN without blocking in Perl?

How do I tell if a variable has a numeric value in Perl?

Why doesn't my Perl map return anything?

How can I determine the type of a blessed reference in Perl?

Parsing attributes with regex in Perl

How do you retrieve selected text using Regex in C#?

Why does the Perl conditional operator not do what I expect?

Class::DBI-like library for php?

How do you create objects in Perl?

How do I remove duplicate items from an array in Perl?

Is The Perl Journal available online?

Can you force either a scalar or array ref to be an array in Perl?

What's the safest way to iterate through the keys of a Perl hash?