I'm writing a lexer in haskell. Here's the code:
lexer :: String -> [Token]
lexer s
| s =~ whitespace :: Bool =
let token = s =~ whitespace :: String in
lex (drop (length token) s)
| s =~ number :: Bool =
let token = s =~ number :: String in
Val (read token) : lex (drop (length token) s)
| s =~ operator...
Hello all,
This is my use case: Input is a string representing an Oracle PL/SQL statement of arbitray complexity. We may assume it's a single statement (not a script).
Now, several bits of this input string have to be rewritten.
E.g. table names need to be prefixed, aggregate functions in the selection list that don't use a column ali...
I'd like to add a keyword to my language.
This keyword would only have to be matched during one particular parser grammar rule.
Due to backward compatibility I'd like to allow this keyword to continue to be used as a variable name, ie it can be matched by the lexer rule that determines if a token is suitable for a variable name.
The ...
Hi,
I'm trying to implement a python parser using PLY for the Kconfig language used to generate the configuration options for the linux kernel.
There's a keyword called source which performs an inclusion, so what i do is that when the lexer encounters this keyword, I change the lexer state to create a new lexer which is going to lex th...
I have a project where a user needs to define a set of instructions for a ui that is completely written in javascript. I need to have the ability to parse a string of instructions and then translate them into instructions. Is there any libraries out there for parsing that are 100% javascript? Or a generator that will generate in javascri...
Hi, I was trying hard to make ANTLR 3.2 generate parser/lexer in C++. It was fruitless. Things went well with Java & C though.
I was using this tutorial to get started: http://www.ibm.com/developerworks/aix/library/au-c%5Fplusplus%5Fantlr/index.html
When I checked the *.stg files, I found that:
CPP has only
./tool/src/main/resources/...
I am attempting to parse Lua, which depends on whitespace in some cases due to the fact that it doesn't use braces for scope. I figure that by throwing out whitespace only if another rule doesn't match is the best way, but i have no clue how to do that. Can someone help me?
...
I'm working on a JavaScript collator/compositor implemented in Java. It works, but there has to be a better way to implement it and I think a Lexer may be the way forward, but I'm a little fuzzy.
I've developed a meta syntax for the compositor which is a subset of the JavaScript language. As far as a typical JavaScript interpreter is c...
There are lots of parsers and lexers for scripts (i.e. structured computer languages). But I'm looking for one which can break a (almost) non-structured text document into larger sections e.g. chapters, paragraphs, etc.
It's relatively easy for a person to identify them: where the Table of Contents, acknowledgements, or where the main...
Hi everyone,
I am trying to build a parser with Bison/Yacc to be able to parse a flow of token done by another module. The token different token id are already listed in a enumeration type as follow:
// C++ header file
enum token_id {
TokenType1 = 0x10000000,
TokenType2 = 0x11000000,
TokenType3 = 0x1110000...
I'm working on a group project for my University which is going to be used for plagiarism detection in Computer Science.
My group is primarily going off the hashing/fingerprinting techniques described in this journal article: Winnowing: Local Algorithms for Document Fingerprinting. This is very similar to how the MOSS plagiarism detect...
I'm working on an anti-plagiarism project for my CS class. This involves detecting plagiarism in computer science courses (programming assignments), through a technique described "Winnowing: Local Algorithms for Document Fingerprinting."
Basically, I'm taking a group of programming assignments. Lets say one of the assignments looks lik...
Suppose you have this pseudo-code
do_something();
function do_something(){
print "I am saying hello.";
}
Why do some programming languages require the call to do_something() to appear below the function declaration in order for the code to run?
...
Can you use a token defined in the lexer in a hidden channel in a single rule of the parser as if it were a normal token?
The generated code is Java...
thanks
...
And by string literals I mean those containing \123-like characters too.
I've written something but I don't know if it's perfect:
<STRING> {
\" { yybegin(YYINITIAL);
return new Token(TokenType.STRING,string.toString()); }
\\[0-3][0-7][0-7] { string.append( ...
Can I write a rule where the initial token is partly fixed and partly generic?
rule: ID '=' NUMBER
;
ID: (A.. Z | a.. Z) +
NUMBER: (0 .. 9) +
But only if the token ID is in the form var* (var is fixed)
Thanks
...
I have a combined grammar (lexer and parser on the same file). How do I set the
filter = true
to the lexer?
Thanks
...
I'm trying to learn ANTLR and at the same time use it for a current project.
I've gotten to the point where I can run the lexer on a chunk of code and output it to a CommonTokenStream. This is working fine, and I've verified that the source text is being broken up into the appropriate tokens.
Now, I would like to be able to modify the...
When constructing a lexer/tokenizer is it a mistake to rely on functions(in C) such as isdigit/isalpha/... ? They are dependent on locale as far as I know. Should I pick a character set and concentrate on it and make a character mapping myself from which I look up classifications? Then the problem becomes being able to lex multiple chara...
I'm working on a tool that will perform some simple transformations on programs (like extract method). To do this, I will have to perform the first few steps of compilation (tokenizing, parsing and possibly building a symbol table). I'm going to start with C and then hopefully extend this out to support multiple languages.
My question i...