pyparsing

Is there a library similar to pyparsing in Java?

I need to quickly build a parser for a very simplified version of a html-like markup language in Java. In python, I would use pyparsing library to do this. Is there something similar for Java? Please, don't suggest libraries already out there for html parsing, my application is a school assignment which will demonstrate walking a tree of...

Need help on making the recursive parser using pyparsing

I am trying the python pyparsing for parsing. I got stuck up while making the recursive parser. Let me explain the problem I want to make the Cartesian product of the elements. The syntax is cross({elements },{element}) I put in more specific way cross({a},{c1}) or cross({a,b},{c1}) or cross({a,b,c,d},{c1}) or So the general f...

How do I get PyParsing set up on the Google App Engine?

I saw on the Google App Engine documentation that http://www.antlr.org/ Antlr3 is used as the parsing third party library. But from what I know Pyparsing seems to be the easier to use and I am only aiming to parse some simple syntax. Is there an alternative? Can I get pyparsing working on the App Engine? ...

Simple recursive descent in PyParsing

I've tried taking this code and converting it to something for a project I'm working on for programming language processing, but I'm running into an issue with a simplified version: op = oneOf( '+ - / *') lparen, rparen = Literal('('), Literal(')') expr = Forward() expr << ( Word(nums) | ( expr + op + expr ) | ( lparen + expr + rparen)...

PyParsing simple language expressions

I'm trying to write something that will parse some code. I'm able to successfully parse foo(spam) and spam+eggs, but foo(spam+eggs) (recursive descent? my terminology from compilers is a bit rusty) fails. I have the following code: from pyparsing_py3 import * myVal = Word(alphas+nums+'_') myFunction = myVal + '(' + delimitedList( ...

Parsing SQL with Python

I want to create a SQL interface on top of a non-relational data store. Non-relational data store, but it makes sense to access the data in a relational manner. I am looking into using ANTLR to produce an AST that represents the SQL as a relational algebra expression. Then return data by evaluating/walking the tree. I have never implem...

What's the closest thing to pyparsing that exists for .NET?

What I'm especially interested in is the ability to define the grammar in the code as ordinary code without any unnecessary cruft. I'm aware I could use IronPython. I don't want to. UPDATE: To further explain what I'm looking for, I'm including some sample pyparsing code. This is an incomplete parser to convert emacs shortcut keys to ...

How do I parse indents and dedents with pyparsing?

Here is a subset of the Python grammar: single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE stmt: simple_stmt | compound_stmt simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE small_stmt: pass_stmt pass_stmt: 'pass' compound_stmt: if_stmt if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite] suite: s...

Using pyparsing to parse a word escape-split over multiple lines

I'm trying to parse words which can be broken up over multiple lines with a backslash-newline combination ("\\n") using pyparsing. Here's what I have done: from pyparsing import * continued_ending = Literal('\\') + lineEnd word = Word(alphas) split_word = word + Suppress(continued_ending) multi_line_word = Forward() multi_line_word << ...

Find following tag with pyparsing

I'm using pyparsing to parse HTML. I'm grabbing all embed tags, but in some cases there's an a tag directly following that I also want to grab if it's available. example: import pyparsing target = pyparsing.makeHTMLTags("embed")[0] target.setParseAction(pyparsing.withAttribute(src=pyparsing.withAttribute.ANY_VALUE)) target.ignore(pypar...

Python: Invalid Syntax with test data using Pyparser

Using pyparser, I am trying to create a very simple parser for the S-Expression language. I have written a very small grammar. Here is my code: from pyparsing import * alphaword = Word(alphas) integer = Word(nums) sexp = Forward() LPAREN = Suppress("(") RPAREN = Suppress(")") sexp << ( alphaword | integer | ( LPAREN + ZeroOr...

How to write the grammar for this in pyparsing: match a set of words but not containing a given pattern

I am new to Python and pyparsing. I need to accomplish the following. My sample line of text is like this: 12 items - Ironing Service 11 Mar 2009 to 10 Apr 2009 Washing service (3 Shirt) 23 Mar 2009 I need to extract the item description, period tok_date_in_ddmmmyyyy = Combine(Word(nums,min=1,max=2)+ " " + Word(alphas, exact=3)...

Debugging Pyparsing Grammar

I'm building a parser for an imaginary programming language called C-- (not the actual C-- language). I've gotten to the stage where I need to translate the language's grammar into something Pyparsing can accept. Unfortunatly when I come to parse my input string (which is correct and should not cause Pyparsing to error) it's not parsing ...

scanString end location: why it is end_index+1?

python/pyparsing When I use scanString method, it is giving the start and end location of the matched token, in the text. e.g. line = "cat bat" pat = Word(alphas) for i in pat.scanString(line): print i I get the following: ((['cat'], {}), 0, 3) ((['bat'], {}), 4, 7) But cat end location should be "2" right? Why it is repor...

How should I organise my functions with pyparsing?

I am parsing a file with python and pyparsing (it's the report file for PSAT in Matlab but that isn't important). here is what I have so far. I think it's a mess and would like some advice on how to improve it. Specifically, how should I organise my grammar definitions with pyparsing? Should I have all my grammar definitions in one fun...

Keyword Matching in Pyparsing: non-greedy slurping of tokens

Pythonistas: Suppose you want to parse the following string using Pyparsing: 'ABC_123_SPEED_X 123' were ABC_123 is an identifier; SPEED_X is a parameter, and 123 is a value. I thought of the following BNF using Pyparsing: Identifier = Word( alphanums + '_' ) Parameter = Keyword('SPEED_X') or Keyword('SPEED_Y') or Keyword('SPEED_Z') ...

Partial evaluation with pyparsing

I need to be able to take a formula that uses the OpenDocument formula syntax, parse it into syntax that Python can understand, but without evaluating the variables, and then be able to evaluate the formula many times with changing valuables for the variables. Formulas can be user input, so pyparsing allows me to both effectively handle ...

How to use pyparsing to parse and hash strings enclosed by special characters?

The majority of pyparsing examples that I have seen have dealt with linear expressions. a = 1 + 2 I'd like to parse mediawiki headlines, and hash them to their sections. e.g. Introduction goes here ==Hello== foo foo ===World=== bar bar Dict would look like: {'Introduction':'Whoot introduction goes here', 'Hello':"foo\nfoo", 'World...

Pyparsing - where order of tokens in unpredictable

I want to be able to pull out the type and count of letters from a piece of text where the letters could be in any order. There is some other parsing going on which I have working, but this bit has me stumped! input -> result "abc" -> [['a',1], ['b',1],['c',1]] "bbbc" -> [['b',3],['c',1]] "cccaa" -> [['a',2],['c',3]] I could use se...

Will rewriting a multipurpose log file parser to use formal grammars improve maintainability?

TLDR: if I built a multipurpose parser by hand with different code for each format, will it work better in the long run using one chunk of parser code and an ANTLR, PyParsing or similar grammar to specify each format? Context: My job involves lots of benchmark log files from ~50 different benchmarks. There are a few in XML, a few HTML,...