tags:

views:

866

answers:

11

I'm trying to figure out how to parse some XML (for an Android app), and it seems pretty ridiculous how difficult it is to do in Java. It seems like it requires creating an XML handler which has various callbacks (startElement, endElement, and so on), and you have to then take care of changing all this data into objects. Something like this tutorial.

All I really need is to change an XML document into a multidimensional array, and even better would be to have some sort of Hpricot processor. Is there any way to do this, or do I really have to write all the extra code in the example above?

+9  A: 

There are two different types of processors for XML in Java (3 actually, but one is weird). What you have is a SAX parser and what you want is a DOM parser. Take a look at http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/ for how to use the DOM parser. DOM will create a tree which you can navigate pretty easily. SAX is best for large documents but DOM is much easier if slower and much more memory intensive.

stimms
What's the third? I'm only familiar with SAX and DOM.
Thomas Owens
StAX is the weird one.
Hank Gay
What is weird about a pull parser?
jitter
StAX isn't even available on Android? What are you talking about
jitter
Awesome, that link is exactly what I was looking for.
Kyle Slattery
vtd-xml is the latest development in XML parsing
vtd-xml-author
@jimmy zang: maybe you should add that you also are the author of that vtd-xml library. and that IMHO you are a making a lot of claims about being the best parser ever without making that provable by a third party.
jitter
the benchmark is open sourced, download it and try it yourself, within 10 minutes
vtd-xml-author
+7  A: 

Check this article for ways to handle XML on Android. Maybe the DOM or XML Pull style fit your style better

Working with XML on Android

jitter
Yes, xmlpullparser is the way to go on android (built-in, streaming). Just takes a bit time to wrap your head around it.
alex
+1 for pull parser - faster than DOM, less boilerplate than SAX.
gustafc
+1  A: 

Starting w/ Java 5, there is an XPath library in the SDK. See this tutorial for an introduction to it.

Hank Gay
A: 

Well parsing XML is not an easy task.

Its basic structure is a tree with any node in tree capable of holding a container which consists of an array of more trees.

Each node in a tree contains a tag and a value but in addtion can contain an arbitary number of named attributes, and, an arbitary number of children or containers.

XML parsing tasks tend to fall in to three catagories.

Things that can be done with "regex". E.g. you want to find the value of the first "MailTo" tag and are not interested in the contents of any other tags.

Things you can parse yourself. The xml structure is always very simple e.g a root node and ten well known tags with simple values.

All the rest! Even though an xml message format can look deceptively simple home made parsers are easily confused by extra attributes, CDATA and unexpected children. Full blown XML parsers can handle all of these situations. Here the basic choice is between a stream or a DOM parser. If you intend to use most of the entities/attributes given in the order you want to use them then a DOM parser is ideal. If you are only interested in a few attributes and intend to use them in the order they are presented, if you have performance constraints, or, if the xml files are large ( > 500MB ) than a stream parser is the way to go; the callback mechanism takes a bit of "groking" but its actually quite simple to program once you get the hang of it.

James Anderson
Are you seriously suggesting that one should use regexps or a home-grown XML parser for "simple" cases? -1
gustafc
Would not really recommend it except where performance was big factor. For instance if you were load balancing based on customer number, it might make sense just to scan for the first CustNo tag rather than firing up the full monster XML parser.
James Anderson
+2  A: 

Acording to me, you should use SAX parser because: - Fast - you can control everything in XML document

You will pay more time to coding, but it's once because you will create code template to parse XML

From second case, you only edit content of changes.

Good luck!

misamap
A: 

You could also use Castor to map the XML to Java beans. I have used it before and it works like a charm.

Rahul
A: 

Writing SAX handler is the best way to go. And once you do that you will never go back to anything else. It's fast, simple and it crunches away as it goes, no sucking large parts or god forbid a whole DOM into memory.

DroidIn.net
A: 

Try http://simple.sourceforge.net, its an XML to Java serialization and binding framework, its fully compatible with Android and is very lightweight, 270K and no dependencies.

ng
A: 

A couple of weeks ago I battered out a small library (a wrapper around javax.xml.stream.XMLEventReader) allowing one to parse XML in a similar fashion to a hand-written recursive descent parser. The source is available on github, and a simple usage example is below. Unfortunately Android doesn't support this API but it is very similar to the XmlPullParser API, which is supported, and porting wouldn't be too time-consuming.

accept("tilesets");
    while (atTag("tileset")) {
        String filename = attrib("file");
        File tilesetFile = new File(filename);
        if (!tilesetFile.isAbsolute()) {
            tilesetFile = new File(FilenameUtils.concat(file.getParent(), filename));
        }
        int tilesize = Integer.valueOf(attrib("tilesize"));
        Tileset t = new Tileset(tilesetFile, tilesize);
        t.setID(attrib("id"));
        tilesets.add(t);

        accept();
        close();
    }
close();

expect("map");

int width       = Integer.valueOf(attrib("width"));
int height      = Integer.valueOf(attrib("height"));
int tilesize    = Integer.valueOf(attrib("tilesize"));
jaz303
A: 

In my opinion, using XPath for parsing XML may be your easiest coding approach. You can embody the logic for pulling out nodes from an XML document in a single expression, rather than having to write the code to traverse the document's object graph.

I note that another posted answer to this question already suggested using XPath. But not yet for your Android project. As of right now, the XPath parsing class is not yet supported in any Android release (even though the javax.xml namespace is defined in the Davlik JVM, which could fool you, as it did me at first).

Inclusion of XPath class in Android is a current work item in late phase. (It is being tested and debugged by Google as I write this). You can track the status of adding XPath to Davlik here: http://code.google.com/p/android/issues/detail?id=515

(It's an annoyance that you cannot assume things supported in most Java VMs are included yet in the Android Davlik VM.)

Another option, while waiting for official Google support, is JDOM, which presently claims Dalvik VM compatibility and also XPath support (in beta). (I have not checked this out; I'm just repeating current claims from their web site.)

M.Bearden
A: 

I've created a really simple API to solve precisely this problem. It's just a single class that you can include in your code base and it's really clean and easy to parse any XML. You can find it here:

http://argonrain.wordpress.com/2009/10/27/000/

Chris