I'm trying to learn ANTLR and at the same time use it for a current project.
I've gotten to the point where I can run the lexer on a chunk of code and output it to a CommonTokenStream. This is working fine, and I've verified that the source text is being broken up into the appropriate tokens.
Now, I would like to be able to modify the text of certain tokens in this stream, and display the now modified source code.
For example I've tried:
import org.antlr.runtime.*;
import java.util.*;
public class LexerTest
{
public static final int IDENTIFIER_TYPE = 4;
public static void main(String[] args)
{
String input = "public static void main(String[] args) { int myVar = 0; }";
CharStream cs = new ANTLRStringStream(input);
JavaLexer lexer = new JavaLexer(cs);
CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);
int size = tokens.size();
for(int i = 0; i < size; i++)
{
Token token = (Token) tokens.get(i);
if(token.getType() == IDENTIFIER_TYPE)
{
token.setText("V");
}
}
System.out.println(tokens.toString());
}
}
I'm trying to set all Identifier token's text to the string literal "V".
1) Why are my changes to the token's text not reflected when I call tokens.toString()?
2) How am I suppose to know the various Token Type IDs? I walked through with my debugger and saw that the ID for the IDENTIFIER tokens was "4" (hence my constant at the top). But how would I have known that otherwise? Is there some other way of mapping token type ids to the token name?
EDIT:
One thing that is important to me is I wish for the tokens to have their original start and end character positions. That is, I don't want them to reflect their new positions with the variable names changed to "V". This is so I know where the tokens were in the original source text.