I’m trying to build a grammar that interprets user-entered text, search-engine style. It will support the AND, OR, NOT and ANDNOT Boolean operators. I have pretty much everything working, but I want to add a rule that two adjacent keywords outside of a quoted string implicitly are treated as in an AND clause. For example:
cheese and crackers = cheese AND crackers
(up and down) or (left and right) = (up AND down) OR (left AND right)
cat dog “potbelly pig” = cat AND dog AND “potbelly pig”
I’m having trouble with the last one, and I’m hoping somebody can point me in the right direction. Here’s my *.g file thus far, and please be nice, my ANTLR experience spans less than a work day:
grammar SearchEngine;
options { language = CSharp2; output = AST; }
@lexer::namespace { Demo.SearchEngine }
@parser::namespace { Demo.SearchEngine }
LPARENTHESIS : '(';
RPARENTHESIS : ')';
AND : ('A'|'a')('N'|'n')('D'|'d');
OR : ('O'|'o')('R'|'r');
ANDNOT : ('A'|'a')('N'|'n')('D'|'d')('N'|'n')('O'|'o')('T'|'t');
NOT : ('N'|'n')('O'|'o')('T'|'t');
fragment CHARACTER : ('a'..'z'|'A'..'Z'|'0'..'9');
fragment QUOTE : ('"');
fragment SPACE : (' '|'\n'|'\r'|'\t'|'\u000C');
WS : (SPACE) { $channel=HIDDEN; };
PHRASE : (QUOTE)(CHARACTER)+((SPACE)+(CHARACTER)+)+(QUOTE);
WORD : (CHARACTER)+;
startExpression : andExpression;
andExpression : andnotExpression (AND^ andnotExpression)*;
andnotExpression : orExpression (ANDNOT^ orExpression)*;
orExpression : notExpression (OR^ notExpression)*;
notExpression : (NOT^)? atomicExpression;
atomicExpression : PHRASE | WORD | LPARENTHESIS! andExpression RPARENTHESIS!;