ansaurus

Question

How to write an ANTLR parser for JSP/ASP/PHP like languages?

Answer 1

+2 A:

I can't speak for ANTLR, as I use a different lexer/parser (the DMS Software Reengineering Toolkit, for which I have developed precisely such JSP and PHP lexer/parsers. (ASP isn't different as you have observed in your question).

But the basic idea is that the lexer needs lexical modes to recognize when you are picking up "anytext" and when you are processing "real" programming language text. So you need a starting lexical mode, say HTML, whose job is to absorb the HTML text, and when it encounters an transition-into PHP, switches modes. You also need a PHP mode which picks up all the PHP tokens, and switches back to HTML mode when the transition-out characters are encountered. Here's a sketch:

%%HTML -- mode
#token HTMLText "~[]* \< \% "
   << (GotoPHPMode) >>

%%PHP -- mode
#token KEYWORD "KEYWORD"
...
#token '%>'  "\%\>"
   << (GotoHTMLMode) >>

Your lexer generator is likely to have some kind of mode-switching capability that you'll have to use instead of this. And you'll likely find that lexing the HTML stuff is more complicated than it looks (you have to worry about <SCRIPT tags and lots of other crazy HTML stuff, but those are details I presume you can handle.

Ira Baxter 2009-09-28 04:20:06

Many thanks for your response. The mode switching might be indeed a solution, although it's still a bit problematic with ANTLR, because only the lexer should be switched and the parser must stay the same. (Otherwise it would be hard to parse things like "<% for ... %>AnyText<% endfor %>").The easiest solution I explored yet is the use of boost::spirit. There, the lexer is called by the parser and so you simple can write as many rules including anychar_p's as you want, without switching mode.

tux21b 2009-09-28 19:31:00

ansaurus

tags:

views:

answers:

How to write an ANTLR parser for JSP/ASP/PHP like languages?

related questions