views:

145

answers:

1

I'm trying to parse a list of Name=Value pairs, where the value can contain anything except whitespace (i.e. values can contain equal signs).
The name is restricted to usual identifier characters.

The problem is, the 'Value' token matches everything. For example, for the input:

dude=sweet

the parser will match the whole input with a 'Value' token (and throw a MismatchedTokenException).

In bison, there was the possibility to assign states to tokens (or was this just for nonterminals?) so that they only become 'eligible' for matching after an explicit transition to that state.

EDIT Thinking about it, this won't work in bison either - the token splitting has already taken place (in flex); however, I think there was a way to REJECT tokens, forcing flex try a second-best match.

Here's my ANTLR grammar.

grammar command_string;

start   
    :  commandParam* EOF
    ;
commandParam 
    : IDENTIFIER '=' CONTINUOUS_VALUE 
    ;
IDENTIFIER 
    : ('-'|'_'|'a'..'z'|'A'..'Z'|'0'..'9')+ 
    ;
CONTINUOUS_VALUE
    : ~( ALL_WS )+
    ;
WS
    : (ALL_WS) +   { $channel = HIDDEN; }
    ;
fragment ALL_WS     
    : ' ' | '\t' | '\r' | '\n' 
    ;
+1  A: 

You've got some overlap between CONTINUOUS_VALUE and IDENTIFIER (the chars in IDENTIFIER are a subset of CONTINUOUS_VALUE. There's probably a couple of ways to solve this. One way would be to start CONTINUOUS_VALUE with the '=' and then strip it out of the text. In CSharp it would look like this:

CONTINUOUS_VALUE
    :   '=' ~( ALL_WS )+ { Text = Text.Substring(1, Text.Length - 1); }
    ;

Then just take the '=' out of the commandParam rule.

The 2nd way would be to make IDENTIFIER and CONTINUOUS_VALUE parser rules (lower-case at least the first letter), then you have context to figure out which one should match. You might be able to make them fragments as well and reference them in commandParam, but I'm not sure if you can nest fragments or not since you already have the ALL_WS fragment.

Also, don't you need some sort of separator between the NameValue pairs?

Ted Elliott
whitespace is the separator - that's why it's not allowed in the body of values. Starting the value with the '=' character sounds like a good idea. I'll try that.
Cristi Diaconescu