views:

90

answers:

1

Hi,

I am trying to do a sintax text corrector for my compilers' class. The idea is: I have some rules, which are inherent to the language (in my case, Portuguese), like "A valid phrase is SUBJECT VERB ADJECTIVE", as in "Ruby is great".

Ok, so first I have to tokenize the input "Ruby is great". So I have a text file "verbs", with a lot of verbs, one by line. Then I have one text "adjectives", one "pronouns", etc.

I am trying to use Ragel to create a parser, but I don't know how I could do something like:

%%{
  machine test;
  subject = <open-the-subjects-file-and-accept-each-one-of-them>;
  verb = <open-the-verbs-file-and-accept-each-one-of-them>;
  adjective = <open-the-adjective-file-and-accept-each-one-of-them>;
  main = subject verb adjective @ { print "Valid phrase!" } ;
}%%

I looked at ANTLR, Lex/Yacc, Ragel, etc. But couldn't find one that seemed to solve this problem. The only way to do this that I could think of was to preprocess Ragel's input file, so that my program reads the file and writes its contents at the right place. But I don't like this solution either.

Does anyone knows how I could do this? There's no problem if it isn't with Ragel, I just want to solve this problem. I would like to use Ruby or Python, but that's not really necessary either.

Thanks.

A: 

With bison I would write the lexer by hand, which lookup the words in the predefined dictionary.

Rudi