antlr

ANTLR problem with binding

I have strings like this: `(val1, val2, val3)` And I have ANTLR grammar to parse this code: grammar TEST; tokens { ORB = '('; CRB = ')'; COMA = ','; } @members{ } /*Parser rule*/ mainRule : ORB WORD (COMA WORD)* CRB; /*Lexer rule*/ WORD : ('a'..'z'|'A'..'Z'|'0'..'9')+; WS : ( '\t' | ' ' | '\r' | '\n'| '\...

Using ANTLR to parse a log file

Hi .. I'm just about starting with ANTLR and trying to parse some pattern out of a log file for example: log file: 7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function1(selected=["red","yellow"]){} 7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog ...

How to force ANTLR to generate NoViableAltException?

I'm working with antlr 3.2. I have a simple grammar that consists of atoms (which are either the characters "0" or "1"), and a rule which accumulates a comma separated list of them into a list. When I pass in "00" as input, I don't get an error, which surprises me because this should not be valid input: C:\Users\dan\workspace\antlrtest...

Using C++ types in an ANTLR-generated C parser

I'm trying to use an ANTLR v3.2-generated parser in a C++ project using C as the output language. The generated parser can, in theory, be compiled as C++, but I'm having trouble dealing with C++ types inside parser actions. Here's a C++ header file defining a few types I'd like to use in the parser: /* expr.h */ enum Kind { PLUS, MI...

ANTLR Parser Question

I'm trying to parse a number of text records where elements in a record are separated by a '+' char, and where the entire record is terminated by a '#' char. For example E1+E2+E3+E4+E5+E6# Individual elements can be required or optional. If an element is optional, its value is simply missing. For example, if E2 were missing, the input ...

Is there a working C++ grammar file for ANTLR?

Are there any existing C++ grammar files for ANTLR? I'm looking to lex, not parse some C++ source code files. I've looked on the ANTLR grammar page and it looks like there is one listed created by Sun Microsystems here. However, it seems to be a generated Parser. Can anyone point me to a C++ ANTLR lexer or grammar file? ...

How do I display all pronouns in a sentence and their persons using antlr

EDITED according to WayneH's grammar Here's what i have in my grammar file. grammar pfinder; options { language = Java; } sentence : ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?')) ; words : WORDS {System.out.println($text);}; pronoun returns [String value] : sfirst {$value = $sfirst.value; System.o...

ANTLR 3.x - How to format rewrite rules

I'm finding myself challenged on how to properly format rewrite rules when certain conditions occur in the original rule. What is the appropriate way to rewrite this: unaryExpression: op=('!' | '-') t=term -> ^(UNARY_EXPR $op $t) Antlr doesn't seem to like me branding anything in parenthesis with a label and "op=" fails. Also, I've...

Perl Regex Grammar For ANTLR Use

I need to create a regex parser for a project and I am using ANTLR v3 to do this. I am trying to find an up-to-date, Perl6-like regex grammar. Does anyone have a source? Googling for this has been difficult for some reason. ...

ANTLR or Regex?

I'm writing a CMS in ASP.NET/C#, and I need to process things like that, every page request: <html> <head> <title>[Title]</title> </head> <body> <form action="[Action]" method="get"> [TextBox Name="Email", Background=Red] [Button Type="Submit"] </form> </body> </html> and replace the [...] of course. My qu...

Making ANTLR generated class files into one jar file.

With ANTLR, I get some java class files after compilation. And I need to make all the class files into one jar file. I make manifest.mf file that has one line "Main-class: Test" to indicate the main file. I run 'jar cmf manifest.mf hello.jar *.class' to get hello.jar file. But when I try to run 'java -jar hello.jar', I get the followi...

ANTLR: multiplication omiting '*' symbol

I'm trying to create a grammar for multiplying and dividing numbers in which the '*' symbol does not need to be included. I need it to output an AST. So for input like this: 1 2 / 3 4 I want the AST to be (* (/ (* 1 2) 3) 4) I've hit upon the following, which uses java code to create the appropriate nodes: grammar TestProd; options...

In ANTLR, how do you specify a specific number of repetitions?

I'm using ANTLR to specify a file format that contains lines that cannot exceed 254 characters (excluding line endings). How do I encode this in the grammer, short of doing: line : CHAR? CHAR? CHAR? CHAR? ... (254 times) ...

ANTLR MismatchedTokenException on simple grammar

I'm completely new to ANTLR and EBNF grammars to begin with, so this is probably a basic issue I'm simply not understanding. I have a rule such as: version_line : WS? 'VERS' WS? '=' WS? '1.0' WS? EOL ; WS : ' '+ ; EOL : '\r' | '\n' | '\r\n' | '\n\r' ; that matches a statement in my input file that looks like this (with optional white...

ANTLR Rule Debugging Error

I am trying to test "whenDescriptor" rule in following grammar in AntLRWorks. I keep getting following exception as soon as I start debugging. Input text for testing is "when order : OrderBll then" [16:45:07] C:\Documents and Settings\RM\My Documents\My Tools\AntLRWorks\output\__Test__.java:14: cannot find symbol [16:45:07] symbol : me...

ANTLR ambiguity

Hi, I need to match in ANTLR a message containing 2 fields separated by a / First field can have 1..3 digits, second field can have 1..2 digits this does not work msg: f1 '/' f2; f1: DIGIT(DIGIT(DIGIT)?)? ; f2: DIGIT(DIGIT)? How can I avoid ambiguity in such a case? Is there a more elegant way to express the number of repetitions i...

How to get ANTLR 3.2 to exit upon first error?

In section 10.4, The Definitive ANTLR reference tells you to override mismatch() & recoverFromMismatchedSet() if you want to exit upon the first parsing error. But, at least in ANTLR 3.2, it appears that there is no mismatch() method, and the recoverFromMismatchedSet() documentation says that it is "Not Currently Used". So it appears thi...

When is better to use a parser such as ANTLR vs. writing your own parsing code?

I need to parse a simple DSL which looks like this: funcA Type1 a (funcB Type1 b) ReturnType c As I have no experience with grammer parsing tools, I thought it would be quicker to write a basic parser myself (in Java). Would it be better, even for a simple DSL, for me to use something like ANTLR and construct a proper grammer definit...

Python: UnicodeEncodeError when reading from stdin

When running a Python program that reads from stdin, I get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 320: ordinal not in range(128) How can I fix it? Note: The error occurs internal to antlr and the line looks like that: self.strdata = unicode(data) Since I don't want to modi...

ANTLR - accessing token values in c/c++

I am trying to parse integers and to access their value in antlr 3.2. I already found out how to do this in Java: //token definition INT : '0'..'9'+; //rule to access token value: start : val=INT {Integer x = Integer.valueOf( $val.text ).intValue(); } ; ... but I couldn't find a solution for this in C/C++. Does someone...