Note, I know little C, and have no experience with the C runtime of ANTLR, but the Java code from my examples should not be too hard to rewrite into C.
You could do that by overriding the emit(Token)
method from the base Lexer
class and keeping track of the last Token
your lexer processes:
private Token last;
@Override
public void emit(Token token) {
last = token;
super.emit(token);
}
To include this in your lexer, add it in your grammar between the following:
@lexer::members {
// your code here
}
Now you must put the Other
rule before your ExtraData
rule and put a gated semantic predicate before your Other
rule that checks if the last
token was a ExtraData
token:
Other
: {behind(ExtraData)}?=> ~'-' (~' ')*
;
where the behind(int)
method is a custom method in your @lexer::members { ... }
section:
protected boolean behind(int tokenType) {
return last != null && last.getType() == tokenType;
}
which will cause the Other
token to be matched only if the last token was a ExtraData
.
A little demo-grammar of it all:
grammar LookBehind;
@lexer::members {
private Token last;
@Override
public void emit(Token token) {
last = token;
super.emit(token);
}
protected boolean behind(int tokenType) {
return last != null && last.getType() == tokenType;
}
}
parse
: token+ EOF
;
token
: Argument {System.out.println("Argument :: "+$Argument.text);}
| Other {System.out.println("Other :: "+$Other.text);}
| ExtraData {System.out.println("ExtraData :: "+$ExtraData.text);}
;
Argument
: '-'+ (~('-' | '#' | ' '))+
;
Other
: {behind(ExtraData)}?=> ~('-' | ' ') (~' ')*
;
ExtraData
: '#' (~'#')* '#'
;
Space
: (' ' | '\t' | '\r' | '\n') {skip();}
;
and a main-class to test it:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "-argument -argument#with hashed data# #plainhashedData#";
ANTLRStringStream in = new ANTLRStringStream(source);
LookBehindLexer lexer = new LookBehindLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
LookBehindParser parser = new LookBehindParser(tokens);
parser.parse();
}
}
First generate a parser and lexer from the grammar:
java -cp antlr-3.2.jar org.antlr.Tool LookBehind.g
then compile all .java
files:
javac -cp antlr-3.2.jar *.java
and finally run the main class:
java -cp .:antlr-3.2.jar Main
(on Windows do: java -cp .;antlr-3.2.jar Main
)
which then will produce the following output:
Argument :: -argument
Argument :: -argument
ExtraData :: #with hashed data#
Other :: #plainhashedData#
EDIT
As you (Billy) mentioned in your comment, in C you can't override methods. You could also set a boolean flag in the @after{ ... }
clause of each lexer rule to keep track of when the last token is a ExtraData
and use that flag in your predicate:
grammar LookBehind;
@lexer::members {
private boolean lastExtraData = false;
}
parse
: token+ EOF
;
token
: Argument {System.out.println("Argument :: "+$Argument.text);}
| Other {System.out.println("Other :: "+$Other.text);}
| ExtraData {System.out.println("ExtraData :: "+$ExtraData.text);}
;
Argument
@after{lastExtraData = false;}
: '-'+ (~('-' | '#' | ' '))+
;
Other
@after{lastExtraData = false;}
: {lastExtraData}?=> ~('-' | ' ') (~' ')*
;
ExtraData
@after{lastExtraData = true;}
: '#' (~'#')* '#'
;
Space
: (' ' | '\t' | '\r' | '\n') {skip();}
;
Although this is a bit of a hack: in every lexer rule you'll have to set the flag.
You might also post a question to the ANTLR mailing-list: besides many ANTLR experts, the person maintaining ANTLR's C-runtime frequents there.
Good luck!