views:

160

answers:

3

I want to parse a list of (whitespace separated) pairs in the form of

name1=value1 name2=value2 ...

where:

  • NAME can contain anything except whitespace and equal sign
  • VALUE can contain anything except whitespace (including equal signs!)

The problem is getting the parser to match input like

name1=value1

as separate 'NAME EQUALS VALUE' tokens, not as a single 'VALUE' token.

PS. I know this is trivial to code directly, but I need this in the context of a larger parser.

A: 

I think you may end up with an issue if VALUE can contain the equal sign. I think it would be better, if possible, to make the equal sign a reserved character, or switch to a different reserved character to mean '='.

I'm not sure if this would work in the context of your larger parser, but you could split on the space, giving you an array (or whatever data structure your language would use) of 'NAME=VALUE' pairs. Then loop through the array and split again on the reserved character you are using for '='. If you can't change or reserve '=', you could regex to just match the first instance of '='. Hope I'm not way off base!

dsrekab
A: 

You dont need a strong parser for name value pairs, regex would be sufficient. Unless you have some contextual or nested structure, this 'job' belongs in the lexer, not the parser :)

leppie
Quoting myself... "I need this in the context of a larger parser." -> This is a small part that would make the leaves of an expression tree that is itself a part in an even larger grammar definition...
Cristi Diaconescu
A: 

Here is something in antlr, which parses this;

a=b=c=d c=d e=f

This may not be everything you need, but it should be the core.

grammar NameValuePairs;

pairs   :  namevaluepair (WS namevaluepair)*;

namevaluepair
  :  name '=' value;

name  :  ID;

value  :  ID ('=' ID)*;

WS  :  ' ' {skip()};

EQ  :  '=';

ID  :  ~(' ' | '=')*;
Steve Cooper