antlr

Matching lexeme variants with Antlr3

I'm trying to match measurements in English input text, using Antlr 3.2 and Java1.6. I've got lexical rules like the following: fragment MILLIMETRE : 'millimetre' | 'millimetres' | 'millimeter' | 'millimeters' | 'mm' ; MEASUREMENT : MILLIMETRE | CENTIMETRE | ... ; I'd like to be able to accept any combinat...

Can I use an Antlr created lexer/parser to parse PDDL file and return data to a Java program?

Hi, I am new to Antlr, but have used Flex/Bison before. I want to know if what I want to do using Antlr is possible. I want to parse an PDDL file using Antlr and build up my own representation of the PDDL file's contents in a Java Class that I wrote as the PDDL file is parsed (in the actions for the rules?). After the file is finished ...

Building a Compiler for a DSL using ANTLR and the Antlr CSharp Target

I'm in the job of writing a compiler which compiles a project-specific DSL (with the features of a basic scripting language) into an project-specific assembler language. The platform is written in C#, so a native .NET-Compiler would be the perfect solution. I already did a lot of research and it seems ANTLR would fit in the job of build...

Using ANTLR to parse JavaDoc comments

I'm attempting to parse one particular (home grown) JavaDoc tag in my JavaScript file and I'm struggling to understand how I can achieve this. Antlr is complaining as documented below: jsDocComment : '/**' (importJsDocCommand | ~('*/'))* '*/' <== See note 1 ; importJsDocCommand : '@import' gav ; gav : gavGroup ':...

How to do Unicode escape decoding in Antlr tokenizer

I've created a antlr grammar using AntlrWorks, and have created a localization tool for internal use. I would like to convert unicode escape sequences into the actual Java character while parsing, but am unsure of the best way to do this. Here are the token definitions in my grammar. Is there some way to specify an action for the fra...

Is it possible to have a grammar where a "keyword" can also be treated as a "non-keyword"?

I have the following grammar in ANTLRWorks 1.4. I'm playing around with ideas for implementation of a parser in a text-adventure game creator, where the user will specify the various allowable commands for his game. grammar test; parse : cmd EOF; cmd : putSyn1 gameObject inSyn1 gameObject; putSyn1 : Put | Pla...

ANTLR error when not enough, or too many, newlines.

ANTLR gives me the following error when my input file has either no newline at the EOF, or more than one. line 0:-1 mismatched input '' expecting NEWLINE How would I go about taking into account the possibilities of having multiple or no newlines at the end of the input file. Preferably I'd like to account for this in the grammar. ...

ANTLR - emitting multiple tokens for a lexer rule

Hi , I wanted to know if ANTLR supports emitting multiple tokens for a lexer rule, given the target language is JavaScript. I have found that it supports multiple tokens in other target languages, such as Java and CSharp, but could not find any documentation on this feature being supported in JavaScript. If anyone could point me to any...

ANTLR match to end of input

I'm trying to match my grammar to an entire string, and have it error out if it cannot consume the entire input. Basically, this pseudo regex: \whitespace* [a-zA-Z]+ [-]? [a-zA-Z]+ \whitespace* $ According to this, EOF should work. So, consider this grammar: start : CHARS EOF ; CHARS : ('a'..'z')+ ; If I set input t...

ANTLR (or alternative): decoupling parsing from evaluation

I have a relatively simple DSL that I would like to handle more robustly than a bunch of manually-coded java.util.regex.Pattern statements + parsing logic. The most-quoted tool seems to be ANTLR. I'm not familiar with it and am willing to give it a try. However I get a little leery when I look at the examples (e.g. the ANTLR expression ...

Antlr (lexer): matching the right token

In my Antlr3 grammar, I have several "overlapping" lexer rules, like this: NAT: ('0' .. '9')+ ; INT: ('+' | '-')? ('0' .. '9')+ ; BITVECTOR: ('0' | '1')* ; Although tokens like 100110 and 123 can be matched by more than one of those rules, it is always determined by context which of them it has to be. Example: s: a | b | c ; a: '<' N...

Token return values in ANTLR 3 C

I'm new to ANTLR, and I'm attempting to write a simple parser using C language target (antler3C). The grammar is simple enough that I'd like to have each rule return a value, eg: number returns [long value] : ( INT {$value = $INT.ivalue;} | HEX {$value = $HEX.hvalue;} ) ; HEX returns [long hvalue] : '0' 'x' ('0'..'9'|'a'.....

ANTLR3 lexer precedence

I want to create a token from '..' in the ANTLR3 lexer which will be used to string together expressions like a..b // [1] c .. x // [2] 1..2 // [3] 3 .. 4 // [4] So, I have added, DOTDOTSEP : '..' ; The problem is that I already have a rule: FLOAT : INT (('.' INT (('e'|'E') INT)? 'f'?) | (('e'|'E') INT)? ('...

ANTLR ambiguity in DeCaf - professor unsure where error is

I'm working on a project for school with converting a BNF form Decaf spec into a context-free grammar and building it in ANTLR. I've been working on it for a few weeks and been going to the professor when I've become stuck, but I finally ran into something that he says should not be causing an error. Here's the isolated part of my gramma...

Define Keywords in ANTLR grammar

Hi, I want to build a simple lexical analyzer for a specific language which has reserved words like (if, else, etc.) using ANTLR. I went through several tutorials and was able to find the ways of defining all the options except reserved keywords. How to define reserved keywords in the grammar file in ANTLR ? Thanks in advance Shamika ...

ANTLR rule to consume fixed number of characters

I am trying to write an ANTLR grammar for the PHP serialize() format, and everything seems to work fine, except for strings. The problem is that the format of serialized strings is : s:6:"length"; In terms of regexes, a rule like s:(\d+):".{\1}"; would describe this format if only backreferences were allowed in the "number of matches"...

ANTLR AST building: root node as string instead of character

I might be asking a stupid/basic question but i had been confused about ANTLR AST building. What i want to made is a kind of Boolean expression parser such that on parent nodes i have operator and its operands as children. for instance, a sentence ( ( A B C & D ) | ( E & ( F | G ) ) ) should ideally be representing |...

Java antlr3.2 problem with return values

hi, i have a problem setting/getting the right return values in my antlr grammar. I tried: expressionExp returns [String ref] : Number {$ref = null; } | ident=Identifier {$ref = $ident.text;} | '(' innerExpressionExp ')' ; .. and thought if it now have somethin like ref=expressionExp i get a null if i match a number and th...

ANTLR: problem differntiating unary and binary operators (e.g. minus sign)

Hi guys, i'm using ANTLR (3.2) to parse some rather simple grammar. Unfortunately, I came across a little problem. Take the follwoing rule: exp : NUM | '(' expression OPERATOR expression ')' -> expression+ | '(' (MINUS | '!') expression ')' -> expression ; OPERATOR contains the same minus sign ('-') as is defined with MINUS. Now ...

ANTLR 3 parsing problem

Hello! I have written an ANTLR 3 grammar for parsing TaskJuggler III bookings files (see below). On line project prj "Sample project" "1.0" 2010-10-24-00:00-+0200 - 2010-11-23-09:00-+0100 { I'm getting following errors: line 1:42 mismatched character '-' expecting set '0'..'9' line 1:48 mismatched character ':' expecting...