I currently have a working, simple language implemented in Java using ANTLR. What I want to do is embed it in plain text, in a similar fashion to PHP.
For example:
Lorem ipsum dolor sit amet
<% print('consectetur adipiscing elit'); %>
Phasellus volutpat dignissim sapien.
I anticipate that the resulting token stream would look something like:
CDATA OPEN PRINT OPAREN APOS STRING APOS CPAREN SEMI CLOSE CDATA
How can I achieve this, or is there a better way?
There is no restriction on what might be outside the <%
block. I assumed something like <% print('%>'); %>
, as per Michael Mrozek's answer, would be possible, but outside of a situation like that, <%
would always indicate the start of a code block.
Sample Implementation
I developed a solution based on ideas given in Michael Mrozek's answer, simulating Flex's start conditions using ANTLR's gated semantic predicates:
lexer grammar Lexer;
@members {
boolean codeMode = false;
}
OPEN : {!codeMode}?=> '<%' { codeMode = true; } ;
CLOSE : {codeMode}?=> '%>' { codeMode = false;} ;
LPAREN : {codeMode}?=> '(';
//etc.
CHAR : {!codeMode}?=> ~('<%');
parser grammar Parser;
options {
tokenVocab = Lexer;
output = AST;
}
tokens {
VERBATIM;
}
program :
(code | verbatim)+
;
code :
OPEN statement+ CLOSE -> statement+
;
verbatim :
CHAR -> ^(VERBATIM CHAR)
;