views:

352

answers:

5

I need to quickly build a parser for a very simplified version of a html-like markup language in Java. In python, I would use pyparsing library to do this. Is there something similar for Java? Please, don't suggest libraries already out there for html parsing, my application is a school assignment which will demonstrate walking a tree of objects and serializing to text using visitor pattern, so I'm not thinking in real world terms here. Basically all I need here is tags, attributes and text nodes.

A: 

There are quite a number choices for stringhandling in java. Maybe the very basic java.util.Scanner and java.util.StringTokenizer Classes are helpfull for you?

Another good choice is maybe the org.apache.commons.lang.text library. http://commons.apache.org/lang/apidocs/org/apache/commons/lang/text/package-summary.html

+1  A: 

May be overkill for your use, but javacc is an excellent industrial-strength parser generator. I've used this program/library several times, its reliable and worth learning, particularly if you are going to work with languages and compilers. Here's the description of the program from the website listed above:

Java Compiler Compiler [tm] (JavaCC [tm]) is the most popular parser generator for use with Java [tm] applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar. In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with JavaCC), actions, debugging, etc.

codefin
+5  A: 

Another good parser generator is ANTLR, that might be what you're looking for.

Jorn
+2  A: 

A quick search for parser generators in Java yields JParsec. I've never used it - but it's inspired by a Haskell library, so by definition it must be good:-)

Torsten Marek
Looks like very interesting, departing from the code generators... Thanks for the reference.
PhiLho
+1  A: 

I like JParsec (which I just discovered thanks to Torsten) because it doesn't generate code... :-) Perhaps less efficient, but enough for small tasks.
I found a similar library, JTopas.

There is a good list of parser (generators or not) at Java Source.

PhiLho