views:

349

answers:

2

Hi, I need to parse PHP & JavaScript documents structure to get the info about document functions & their parameters, classes & their methods, variables, and so on ... I'm wondering if there is any solution for doing that (no regular expressions) ... I've heard about something called "lexing" however I was unable to find any examples even the ones that could me tell if this is something what I am looking for or not ...

thanks in advance

+2  A: 

I'm not sure if this is feasible but for PHP would you be able to invoke the PHP CLI from Delphi to get the information?

If so you could call token_get_all() and then spit out the result in something that you can parse in Delphi (maybe xml, json, etc.). This is lexing. The problem with this is that is only half the problem solved - you still have to understand each token in context to get the results you want.

Tom Haigh
that would help me a lot, just if was available as delphi "stand alone" code, with no php dependencies ...
migajek
+1  A: 

By "Lexing" your referring to Lexical Analysis, and there are some ancient tools which mostly still work named Lex and Yacc. Lex builds the tokenizer, and Yacc stands for "yet another compiler compiler" and is the actual parser.

The concept of lex/Yacc, is you build a grammar for the language, and then run the grammar through the paslex tool to generate source code (normally in C) that you can use to parse a file and take action on specific keywords and tokens. Martin Waldenburg wrote a pascal version of lex/yacc named PasLex which has been kicking around for way over a decade now and has been converted to Delphi (although it might not work with the latest versions without some minor work). If I remember correctly, it uses the same .L grammar input files as lex, so any documentation you find for lex/yacc can also be applied to paslex, with the exception that you get pascal code as the output.

I'm not sure about current documentation availability. Before the internet (gasp) we used books and most of this was heavily documented on paper which has long turned yellow...however, rumor has it that you might..just might be able to pick up a used copy from Amazon. I cut my teeth on this using a book which is also known as "the dragon book" which appears to have been re-published as recently as 2006.


EDIT:

I was mistaken by the tool, it was TPLY. PasLex was a delphi grammar implementation...TPLY was the Lex/Yacc tool which generated pascal source from a .L file.

skamradt