lexer

Python: Invalid Token

Some of you may recognize this as Project Euler's problem number 11. The one with the grid. I'm trying to replicate the grid in a large multidimensional array, But it's giving me a syntax error and i'm not sure why grid = [ [ 08, 02, 22, 97, 38, 15, 00, 40, 00, 75, 04, 05, 07, 78, 52, 12, 50, 77, 91, 08 ], [ 49, 49, 99, 40, 17, 81, 18...

What would be a good Delphi lexer/parser for Javascript language file?

Background I want to be able to parse Javascript source in a Delphi Application. I need to be able to identify variables and functions within the source for the purpose of making changes to the code through later code. I understand that I probably need to use a lexer for this purpose but have not had much luck using the lexer which I fo...

How would you parse indentation (python style)?

How would you define your parser and lexer rules to parse a language that uses indentation for defining scope. I have already googled and found a clever approach for parsing it by generating INDENT and DEDENT tokens in the lexer. I will go deeper on this problem and post an answer if I come to something interesting, but I would like to...

Looking for a clear definition of what a "tokenizer", 'parser" and "lexers" are and how they are related to each other and used?

Hello, I am looking for a clear definition of what a "tokenizer", "parser" and "lexer" are and how they are related to each other (e.g., does a parser use a tokenizer or vice versa)? I need to create a program will go through c/h source files to extract data declaration and definitions. I have been looking for examples and can find so...

How to turn a token stream into a parse tree

Hello all. I have a lexer built that streams out tokens from in input but I'm not sure how to build the next step in the process - the parse tree. Does anybody have any good resources or examples on how to accomplish this? ...

Python regular expressions - how to capture multiple groups from a wildcard expression?

I have a Python regular expression that contains a group which can occur zero or many times - but when I retrieve the list of groups afterwards, only the last one is present. Example: re.search("(\w)*", "abcdefg").groups() this returns the list ('g',) I need it to return ('a','b','c','d','e','f','g',) Is that possible? How can I do i...

Standard format for concrete and abstract syntax trees

I have an idea for a hobby project which performs some code analysis and manipulation. This project will require both the concrete and abstract syntax trees of a given source file. Additionally, bi-directional references between the two trees would be helpful. I would like to avoid the work of transcribing a grammar to construct my own l...

Have you ever effectively used lexer/parser in real world application?

Recently, I am started learning Antlr. And knew that lexer/parser together could be used in construction of programming languages. Other than DSL & programming languages, Have you ever directly or in-directly used lexer/parser tools (and knowledge) to solve real world problem? is it possible to solve the same problem by an average progr...

What is the best tool for creating a Java extension?

Our project group is working on a Java language extension and we have been trying to figure out what tool we should use for this purpose. The extension will primarily consist of a modification of the concurrency model used in Java. We have been looking at two tools so far: Polyglot and Javacc. Javacc seems to be a bit more easy to use, b...

Poor man's "lexer" for C#

I'm trying to write a very simple parser in C#. I need a lexer -- something that lets me associate regular expressions with tokens, so it reads in regexs and gives me back symbols. It seems like I ought to be able to use Regex to do the actual heavy lifting, but I can't see an easy way to do it. For one thing, Regex only seems to work ...

Lexer/parser tools

Which lexer/parser generator is the best (easiest to use, fastest) for C or C++? I'm using flex and bison right now, but bison only handles LALR(1) grammars. The language I'm parsing doesn't really need unlimited lookahead, but unlimited lookahead would make parsing a lot easier. Should I try Antlr? Coco/R? Elkhound? Something else? ...

Easy way to parse .h file for comments using Python?

How to parse in easy way a .h file written in C for comments and entity names using Python? We're suppose for a further writing the content into the word file already developed. Source comments are formatted using a simple tag-style rules. Comment tags used for an easy distinguishing one entity comment from the other and non-documentin...

State machine for syntax coloring

Hello. I'm currently learning how lexers and parsers work, and i have following question about state machine. For example, i need to colorize text by following rule: For this rule simple state transition table will look like this: current event next action IDLE $ COLOR - COLOR any - OnColor() COLOR \n IDLE - Th...

Seeking an interactive utility for creating context free parser grammars

Hi, I would like a utility which I can give a piece of text (in a text box) and experiment with a parser grammar (through editing a BNF of similar) and token structure while I can see how the parse tree would look (and if it's not able to parse the text using my current grammar, I would see where it halted). The key word is interactivi...

Determining "Mood" of Textual Phrases through Lexical Analysis

I am looking to apply scores (positive, negative or neutral) to short phrases of text. Short of parsing out emoticons and making assumptions based on their usage, I'm unsure of what else to try. Can anyone provide examples, research papers, articles, etc. that take a more lexical analysis to this problem. I am thinking things like adver...

Matching Lua's "Long bracket" string syntax

I'm writing a jFlex lexer for Lua, and I'm having problems designing a regular expression to match one particular part of the language specification: Literal strings can also be defined using a long format enclosed by long brackets. We define an opening long bracket of level n as an opening square bracket followed by n equal signs fo...

How can I parse marked up text for further processing?

See updated input and output data at Edit-1. What I am trying to accomplish is turning + 1 + 1.1 + 1.1.1 - 1.1.1.1 - 1.1.1.2 + 1.2 - 1.2.1 - 1.2.2 - 1.3 + 2 - 3 into a python data structure such as [{'1': [{'1.1': {'1.1.1': ['1.1.1.1', '1.1.1.2']}, '1.2': ['1.2.1', '1.2.2']}, '1.3'], '2': {}}, ['3',]] I've looked ...

Good parser generator (think lex/yacc or antlr) for .NET? Build time only?

Is there a good parser generator (think lex/yacc or antlr) for .NET? Any that have a license that would not scare lawyers? Lot’s of LGPL but I am working on embedded components and some organizations are not comfortable with me taking an LGPL dependency. I've heard that Oslo may provide this functionality but I'm not sure if it's a bui...

Parser How To in .NET

I'd like to understand how to construct a parser in .NET to process source files. For example, maybe I could begin by learning how to parse SQL or HTML or CSS and then act on the results to be able to format them for readability or something similar. Where can I learn how to do this? Are there specific books I can refer to? Do I need to...

Is there a simple way I can tokenize a string without a full-blown lexer?

I'm looking to implement the Shunting-yard Algorithm, but I need some help figuring out what the best way to split up a string into its tokens is. If you notice, the first step of the algorithm is "read a token." This isn't exactly a non-trivial thing to do. Tokens can consist of numbers, operators and parens. If you are doing some...