ansaurus

Question

What are some good free parsing programs?

Answer 1

+1 A:

Take a look at JavaCC.

JavaCC stands for "the Java Compiler Compiler"; it is a parser generator and lexical analyzer generator. JavaCC will read a description of a language and generate code, written in Java, that will read and analyze that language. JavaCC is particularly useful when you have to write code to deal with an input language has a complex structure

Brandon E Taylor 2010-07-23 16:20:44

Answer 2

+3 A:

ANTLR is pretty popular and even has an IDE to help you develop / test your grammars.

Steven Schlansker 2010-07-23 16:27:11

Answer 3

A:

It depends what you need to parse.

If you need to solve particular problem domain than the best way is to create Domain-specific language and parse it in Groovy.

amra 2010-07-23 16:29:50

Answer 4

+1 A:

i think you are looking for something like apache lucene.

check this : http://lucene.apache.org/java/docs/index.html

mohammad shamsi 2010-07-23 16:29:53

Answer 5

+3 A:

Pyparsing is a good Python add-on module for plain text. Easy to get something going quickly, but has enough supporting components to do some pretty elaborate parsing work. See http://pyparsing.wikispaces.com, and check out the Examples page. (Plus it is very liberally licensed, so there are no restrictions or runtime encumberances.)

Paul McGuire 2010-07-23 16:38:05

Answer 6

A:

If the text has a known format, a grammar parser might be your best bet.

Gold Parser is open source and has both java and python support, among others. http://www.devincook.com/goldparser/

Mirozell 2010-07-23 16:42:17

Answer 7

A:

Lepl - http://www.acooke.org/lepl - is a general-purpose, recursive descent parser for Python that I maintain.

It's similar to pyparsing, in that both are parsers that you write directly in Python. Here's an example that parses and evaluates an arithmetic expression:

>>> from operator import add, sub, mul, truediv

>>> # ast nodes
... class Op(List):
...     def __float__(self):
...         return self._op(float(self[0]), float(self[1]))
...
>>> class Add(Op): _op = add
...
>>> class Sub(Op): _op = sub
...
>>> class Mul(Op): _op = mul
...
>>> class Div(Op): _op = truediv
...

>>> # tokens
>>> value = Token(UnsignedFloat())
>>> symbol = Token('[^0-9a-zA-Z \t\r\n]')

>>> number = Optional(symbol('-')) + value >> float
>>> group2, group3 = Delayed(), Delayed()

>>> # first layer, most tightly grouped, is parens and numbers
... parens = ~symbol('(') & group3 & ~symbol(')')
>>> group1 = parens | number

>>> # second layer, next most tightly grouped, is multiplication
... mul_ = group1 & ~symbol('*') & group2 > Mul
>>> div_ = group1 & ~symbol('/') & group2 > Div
>>> group2 += mul_ | div_ | group1

>>> # third layer, least tightly grouped, is addition
... add_ = group2 & ~symbol('+') & group3 > Add
>>> sub_ = group2 & ~symbol('-') & group3 > Sub
>>> group3 += add_ | sub_ | group2

... ast = group3.parse('1+2*(3-4)+5/6+7')[0]
>>> print(ast)
Add
 +- 1.0
 `- Add
     +- Mul
     |   +- 2.0
     |   `- Sub
     |       +- 3.0
     |       `- 4.0
     `- Add
         +- Div
         |   +- 5.0
         |   `- 6.0
         `- 7.0
>>> float(ast)
6.833333333333333
>>> 1+2*(3-4)+5/6+7
6.833333333333333

The main advantages of Lepl over pyparsing are that it's slightly more powerful (it can compile itself to regular expressions in places for speed, handle left recursive grammars, uses trampolining to avoid running out of stack space). The main disadvantages are that it's younger than pyparsing, so doesn't have the same number of users or as large and supportive a community.

andrew cooke 2010-07-24 13:34:22

ansaurus

tags:

views:

answers:

What are some good free parsing programs?

related questions