I'd need to parse partial SQL queries (it's for a SQL injection auditing tool). For example
'1' AND 1=1--
Should break down into tokens like
[0] => [SQL_STRING, '1']
[1] => [SQL_AND]
[2] => [SQL_INT, 1]
[3] => [SQL_AND]
[4] => [SQL_INT, 1]
[5] => [SQL_COMMENT]
[6] => [SQL_QUERY_END]
Are their any at least lexers for SQL that I base...
First: I have looked at this SO question but unfortunately there is no mention of JavaME
I am looking for a parser/lexer generator that produces code that can run on the Blackberry and its (obnoxious) JavaME.
E.g. at first I thought I could use ANTLR however it seems the run-time library is not compatible with JavaME
TIA
...
I might be wrong, but it looks like that there's no direct flex/bison (lex/yacc) port for C#/.NET so far.
For LALR parser, I found GPPG/GPLEX, and for LL parser, there is the famous ANTLR. But, I want to reuse my flex/bison grammar as much as possible.
Is there any direct port of flex/bison for C#?
What lexer/parser people normally ...
Hi there,
I'm currently looking for a lexer/parser that generate Scala code from a BNF grammar (a ocamlyacc file with precedence and associativity) and I'm quite confused to find.. almost nothing:
For parsing, I found scala-bison (that I have a lot of trouble to deal with). All the other tools are just Java parser imported into Scala (l...
I would like to create simple xml parser using bison/flex. I don't need validation, comments, arguments, only <tag>value</tag>, where value can be number, string or other <tag>value</tag>.
So for example:
<div>
<mul>
<num>20</num>
<add>
<num>1</num>
<num>5</num>
</add>
</mul>
<id>test</id>
</div>
If it h...
When I run the following grammer:
test : WORD+;
WORD : ('a'..'z')+;
WS : ' '+ {$channel = HIDDEN;};
and I give the input "?test" why does antlr accept this as valid input? I thought the ('a'..'z') would only match characters within the lowercase alphabet?
...
I'm writing a lexer (with re2c) and a parser (with Lemon) for a slightly convoluted data format: CSV-like, but with specific string types at specific places (alphanumeric chars only, alphanumeric chars and minus signs, any char except quotes and comma but with balanced braces, etc.), strings inside braces and strings that look like funct...
Has anyone of you successfully added a lexer to scintilla?
I have been following the short instructions at http://www.scintilla.org/SciTELexer.html - and even discovered the secret extra instructions at http://www.scintilla.org/ScintillaDoc.html#BuildingScintilla (Changing Set of Lexers)
Everything compiles, and I can add the lexer t...
I'm trying to do some very basic C++ function declaration parsing. Here is my rule for parsing an input parameter:
arg : 'const'? 'unsigned'? t=STRING m=TYPEMOD? n=STRING
-> ^(ARG $n $t $m?) ;
STRING : ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'::')+ ;
TYPEMOD
: ('*' | '&')+ ;
The problem is I'm trying to pass it something like:
int *pa...
I've been working on a parser for simple template language. I'm using Ragel.
The requirements are modest. I'm trying to find [[tags]] that can be embedded anywhere in the input string.
I'm trying to parse a simple template language, something that can have tags such as {{foo}} embedded within HTML. I tried several approaches to parse...
Hello,
I have some problems with a very simple yacc/lex program. I have maybe forgotten some basic steps (it's been a long time since I've used these tools).
In my lex program I give some basic values like :
word [a-zA-Z][a-zA-Z]*
%%
":" return(PV);
{word} {
yylval = yytext;
printf("yylval = %s\n",yylva...
I have the following
rule : A B;
A : 'a_e' | 'a';
B : '_b';
Input:
a_b //dont work
a_e_b //works
Why is the lexer having trouble matching this? When ANTLR matches the 'a_' in 'a_b' shouldnt it backtrack or use lookahead or something to see it cant match a token A and then decide to match token A as 'a' and then procede to matc...
I've seen a couple of Python Javascript tokenizers and a cryptic document on Mozilla.org about a Javascript Lexer but can't find any Javascript tokenizers for PHP specifically. Are there any?
Thanks
...
I'm reading about compilers and parsers architecture now and I wonder about one thing...
When you have XML, XHTML, HTML or any SGML-based language, what would be the role of a lexer here and what would be the tokens?
I've read that tokens are like words prepared for parsing by the lexer. Although I don't have problem with finding tokens...
Problem: Can't get Unicode character to print correctly.
Here is my grammar:
options { k=1; filter=true;
// Allow any char but \uFFFF (16 bit -1)
charVocabulary='\u0000'..'\uFFFE';
}
ANYCHAR :'$'
| '_' { System.out.println("Found underscore: "+getText()); }
| 'a'..'z' { System.out.println("Found alpha: "+getText()); }
| '\u...
Alright so I am writing a function as part of a lexical analyzer that 'looks up' or searches for a match with a keyword. My lexer catches all the obvious tokens such as single and multi character operators (+ - * / > < = == etc) (also comments and whitespace are already taken out) so I call a function after I've collected a stream of onl...
Hi ,
I wanted to know if ANTLR supports emitting multiple tokens for a lexer rule, given the target language is JavaScript. I have found that it supports multiple tokens in other target languages, such as Java and CSharp, but could not find any documentation on this feature being supported in JavaScript.
If anyone could point me to any...
In my Antlr3 grammar, I have several "overlapping" lexer rules, like this:
NAT: ('0' .. '9')+ ;
INT: ('+' | '-')? ('0' .. '9')+ ;
BITVECTOR: ('0' | '1')* ;
Although tokens like 100110 and 123 can be matched by more than one of those rules, it is always determined by context which of them it has to be. Example:
s: a | b | c ;
a: '<' N...
I want to create a token from '..' in the ANTLR3 lexer which will be used to string together expressions like
a..b // [1]
c .. x // [2]
1..2 // [3]
3 .. 4 // [4]
So, I have added,
DOTDOTSEP : '..'
;
The problem is that I already have a rule:
FLOAT : INT (('.' INT (('e'|'E') INT)? 'f'?) | (('e'|'E') INT)? ('...
I want to know what is the 'terminology name' of the character that designates a start of a literal in a lexing process.
For example:
a string starts and ends with an " character.
a regular expression literal - with an / character.
...