tutorial about xml handling in java 6

There are several different XML technologies in Java, all of them fundamentally doing the same thing (providing some manner of access to the XML tree), but all of them in mostly different ways.

The original technique, SAX, is a streaming XML processor. Simply, you fire up the parser and the parser makes callbacks in to your code as it encounters various XML elements. The key here is that it is processing XML elements, not YOUR XML elements. That is, it will tell you when it's going to see a new XML element (any XML element), or a block of text, rather than the ORDER element, or the ITEM element. SAX sees XML at just above the token level, it's up to your program to actually build an in memory representation from that tree.

Next up, is DOM. Domain Object Model. This is a technique familiar to web developers. It begins by consuming the entire XML document in to memory and, unlike SAX, it returns a reference to the in memory model -- a combination of nodes and pointers to other nodes. You can readily walk the DOM tree to find your information, but it is a generic model. The advantage of the DOM model is that you do get the entire document in a single form. The disadvantage is that you, well, get the entire document in a single form. It's nice for smaller documents, not so good for enormous ones.

Of course, it all depends on what you want from the document. If you truly need all of the information from the document, then you may as well load the entire thing in to memory. If you only need a subset, then a streaming processor may be a better solution, particularly for large documents.

The combination of the too is a technology called StAX, which basically give a "DOM" like view, to a streamed XML document. What this means is that you can worked with XML document much like you would a DOM document, but the parser will lazily load up the information that you want to extract, which can ideally lower overall memory impact. So it's a combination of the 2 above techniques.

For both DOM and StAX, there is a technique available to access the data called "XPath". XPath is a query language to give you access to the individual elements, but in a declaratory way. You can consider XPath the same way you would consider a file name path on your hard disk. Without XPath you would need to start at the root of the XML document, and "crawl" the tree to get to and extract your information. XPath abstracts this process for you, and it can helps avoid much of the noise of an XML document, especially when you're after a subset of a larger document.

Finally, with Java, there is a technology called "JAXB", which is an XML binding technology. This is used to map Java classes to and from XML. You point the system at an XML document, and you get Java classes back. Or, you point a Java class instance at the system, and you can get XML back. For basic cases, I find JAXB quite easy to use, especially when I have control over both the Java classes and the resulting XML. Also, there are other, 3rd party XML binding systems similar to JAXB, but JAXB comes with Java 6.

As to which one you need, that's really up to you.

Most folks avoid SAX today, as it's a bit of a pain to actually use. It's fast, and cheap, but can be time consuming as a programmer.

DOM and XPath is likely the most common technique today, particularly because of it's similar to modern web page programming and how the browsers work with XML. If you have smaller documents (less than 50-100K), and particularly if you only need a subset of the data, DOM and XPath are quite capable and straightforward to use.

If you plan one working at a higher level, and need to work with the entire document, as well as read and write XML, then JAXB may be of interest. My only concern is that mapping a legacy XML document to classes is a bit more complicated, and you may not get early success trying to do that.

All of these have good guides available, and I won't provide links. Rather you can use this guide to focus on the particular XML technology that you think would suit your application best, and go from there.

ansaurus

tags:

views:

answers:

tutorial about xml handling in java 6

related questions