views:

227

answers:

2

I'm a total lexer and parser newbie, so please have some patience. Eventually I want to be able to express LDAP style Query strings, e.g. '(foo=bar)', '(!foo=bar)', '(&(foo=bar)(!zip=zap))' and have a tree in the end which I could use to create the actual database query (or whatever)

So I thought to start with the simplest form, to parse expressions like (foo=bar) and (!foo=bar), but already I have some problems of understanding. I just want to express that the fields are separated from value by a '=', but ANTLR seems to eat all characters at once because the identifier looks a lot like a value. What do I have to do to prevent this?

grammar FilterExpression;

options
{
    language=Java;
    k=2;
}

tokens
{
    NOT='!';
}

term    : '(' NOT? FIELD '=' VALUE ')';
// lexer
FIELD   : NAME;
VALUE   : CDATA;

fragment NAME
    :   ALPHA+;
fragment CDATA
    :   ALPHA*;
fragment ALPHA
    :   ('a'..'z' | 'A'..'Z');
A: 

If fields and values are both identifiers, where an identifier is a non-empty string of alphabetic characters (allowing a value to be empty, as in your example), you could do something like:

term    :       '(' NOT? field '=' value ')';

field : IDENTIFIER ;

value : IDENTIFIER? ;

// lexer
IDENTIFIER : ALPHA+ ;

fragment ALPHA
    :   ('a'..'z' | 'A'..'Z');

Since the lexer can't tell a field from a value, you'd need to let the lexer treat them the same, and use the parser to tell the difference based on the context.

CapnNefarious
+2  A: 
a_m0d