How to create a parser which tokenizes a list of words taken from a file? | ansaurus

tags:

views:

90

answers:

1

Q:

How to create a parser which tokenizes a list of words taken from a file?

Hi,

I am trying to do a sintax text corrector for my compilers' class. The idea is: I have some rules, which are inherent to the language (in my case, Portuguese), like "A valid phrase is SUBJECT VERB ADJECTIVE", as in "Ruby is great".

Ok, so first I have to tokenize the input "Ruby is great". So I have a text file "verbs", with a lot of verbs, one by line. Then I have one text "adjectives", one "pronouns", etc.

I am trying to use Ragel to create a parser, but I don't know how I could do something like:

%%{
  machine test;
  subject = <open-the-subjects-file-and-accept-each-one-of-them>;
  verb = <open-the-verbs-file-and-accept-each-one-of-them>;
  adjective = <open-the-adjective-file-and-accept-each-one-of-them>;
  main = subject verb adjective @ { print "Valid phrase!" } ;
}%%

I looked at ANTLR, Lex/Yacc, Ragel, etc. But couldn't find one that seemed to solve this problem. The only way to do this that I could think of was to preprocess Ragel's input file, so that my program reads the file and writes its contents at the right place. But I don't like this solution either.

Does anyone knows how I could do this? There's no problem if it isn't with Ragel, I just want to solve this problem. I would like to use Ruby or Python, but that's not really necessary either.

Thanks.

A:

With bison I would write the lexer by hand, which lookup the words in the predefined dictionary.

Rudi 2010-07-06 14:56:56

related questions

BNF grammar test case generation

Print stack trace information from C#

What is a good way to format logs?

How do you parse a filename in bash?

How to parse a string into a nullable int in C# (.NET 3.5)

An easy way to diff log files, ignoring the time stamps?

Learning Resources on Parsers, Interpreters, and Compilers

Does C# have built-in support for parsing page-number strings?

Resources for lexing, tokenising and parsing in python

Parsing, where can I learn about it.

Parsing XML using unix terminal

Equation (expression) parser with precedence?

What HTML parsing libraries do you recommend in Java

Where do I get the Antlr Ant task?

How do I put unicode characters in my Antlr grammar?

Resolving reduce/reduce conflict in yacc/ocamlyacc

Best Approach to Parse for SQL in PHP Files?

.Net Parse verses Convert

How can I learn about parser combinators?

Parse usable Street Address, City, State, Zip from a string

C# Save Dialogs

Delimited string parsing framework for .NET

Looking for algorithm that reverses the sprintf() function output

Split a string ignoring quoted sections

What is the best way to parse strings in Java