I need to quickly build a parser for a very simplified version of a html-like markup language in Java. In python, I would use pyparsing library to do this. Is there something similar for Java? Please, don't suggest libraries already out there for html parsing, my application is a school assignment which will demonstrate walking a tree of objects and serializing to text using visitor pattern, so I'm not thinking in real world terms here. Basically all I need here is tags, attributes and text nodes.
There are quite a number choices for stringhandling in java.
Maybe the very basic java.util.Scanner
and java.util.StringTokenizer
Classes are helpfull for you?
Another good choice is maybe the org.apache.commons.lang.text
library.
http://commons.apache.org/lang/apidocs/org/apache/commons/lang/text/package-summary.html
May be overkill for your use, but javacc is an excellent industrial-strength parser generator. I've used this program/library several times, its reliable and worth learning, particularly if you are going to work with languages and compilers. Here's the description of the program from the website listed above:
Java Compiler Compiler [tm] (JavaCC [tm]) is the most popular parser generator for use with Java [tm] applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar. In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with JavaCC), actions, debugging, etc.
A quick search for parser generators in Java yields JParsec. I've never used it - but it's inspired by a Haskell library, so by definition it must be good:-)
I like JParsec (which I just discovered thanks to Torsten) because it doesn't generate code... :-) Perhaps less efficient, but enough for small tasks.
I found a similar library, JTopas.
There is a good list of parser (generators or not) at Java Source.