ansaurus

Question

Answer 1

+1 A:

Well, you could actually write a preprocessor in lex and put it into your build system, but thats probably overkill!

You can use start conditions, switching between them with BEGIN, and parse input first, then use unput to push characters back into the stream, then a different start condition can parse the result (useful link:http://flex.sourceforge.net/manual/Actions.html/).

I recently wrote a parser for a python-like config language that did just that. the parser had two modes (start conditions), one to count tabs at the start of a line to determine scope, and then another to do the actual parsing.

These methods are fine but there is usually a better way of doing it, especially if your input scheme isn't hugely complex.

Is there a gramatical difference between [something " something] and [something"something] for your program? would a whitespace eating rule do the trick?

Could describe your language and grammar a little more....?

After Comment:

Ok, so basically you have two tokens, SOMETHING and QUOTE. If your tokens are seperated by white space you can do the following:

%%
\"     {
       //this will match a single quote
       return QUOTE;
       }

[^" \t\n\r]+   {
               //this will match a run of anything thats not a quote, space, tab or line ending
               return SOMETHING;
               }

[ \t\n\r]      {
               //do nothing: i.e. ignore whitespace
               }

%%

For your SOMETHING token you could also match something like [A-Za-z_][A-Za-z0-9_]* which will match a letter or an underscore followed by 0 or more letters, underscores and numbers.

Does that help?

DaedalusFall 2009-03-24 18:03:28

Basically I need to add whitespace so that the words and the quotes are recognized as separate tokens, and not one giant token.

samoz 2009-03-24 18:13:13

ansaurus

tags:

views:

answers:

Command fall through in lex

related questions