tags:

views:

409

answers:

12

Which is the best class in Java to work with XML documents?

+1  A: 

I think it's JDOM for ease of use.

duffymo
A: 

There are lots of libraries that let you handle XML in different ways, and no one way is "best". As always, it depends on what you are trying to do and what your requirements are.

When I need a DOM-like parser, or for building XML documents, I personally like XOM as it guarantees that the XML documents are well formed and "correct". Its number-one priority is correctness, which is important when interoperating with other systems, something which XML does very well. Its API is also very well designed and intuitive, making common operations very easy.

Adam Batkin
+3  A: 

I find dom4j to come out on top of anything else I've used (especially JDOM, which I find to have a particularly poor API). dom4j allows to plug in Jaxen for XPath support as well.

Examples:

   SAXReader reader = new SAXReader(); // dom4j SAXReader  
   Document document = reader.read(xmlInputStream); // dom4j Document  

   // select all link nodes with href "http://example.com"  
   List<Element> linkNodes = document.selectNodes("//link[@href='http://example.com']");  

   // select an attribute value  
   String val = linkNodes.get(0).attributeValue("href");  

   // select element text and trim it  
   String value = document.elementTextTrim("childNode");
Matthias
+1  A: 

I've had luck with JAXB. It's included in Java SE 6.

Joonas Pulakka
JAXB is an OXM binding solution, not a general XML parser. It uses an XML parser, but it isn't one itself.
duffymo
Yes, but the question was not about general XML parser. It was "Which is the best class in Java to work with XML documents?". And JAXB definitely is one of the alternatives.
Joonas Pulakka
A: 

I prefer using a classic combination of DOM and SAX.

folone
A: 

You have to decide between two different approaches for processing XML: There is DOM and SAX, both with advantages and disadvantages. It all depends on your needs and the size of the XML document you want to process. The already mentioned JAXB builds an API above both and is shipped with Java 6.

When you understood the above, you may want to concrete your question and maybe ask for the best DOM or the best SAX implementation. Beside this, it would be good if you could tell use what your requirements are. Do you want to write or read XML? How big will the files be? And so on.

EDIT:

As Nat pointed out, there is also StAX as a third alternative concept.

Tim Büthe
There is also StAX, which is low-level like SAX but lets the client code pull XML events from a stream rather than handle events pushed to it from the parser. This makes it easy to write recursive descent parsers to process XML content.
Nat
@Nat: Never used that, thanks for the tip! Just edited my answer and included a link.
Tim Büthe
A: 

If you're only doing reading, then XPath is a good bet. Otherwise, the DOM (in the org.w3c.dom package) is your best bet.

Imagist
"best" isn't defined as easiest to understand and code, then. And I see interfaces in org.w3c.dom, so you still need an implementation of some kind.
duffymo
A: 

Java has good support for XML. The problem in one sense is that there are so many options. So, there's no one solution that is "the" way to handle XML in Java. You have to pick your tools based on the problem at hand.

Say you have complex validated documents that you want to load into an object tree that you can then query and manipulate the tree. You'll want a DOM parser for this, and there are a number to choose from. This converts the whole document to objects, which can be costly in terms of CPU.

Say you have a document where you want select out certain elements, and performance is an issue. Try a SAX parser, a pull parser, or XPath.

Perhaps you need to marshal/unmarshal objects on the wire. JAXB is a candidate for this, as are other options.

So, there's no one right answer to your question. As with any [programming] problem, you have to look at the problem, evaluate the options, and pick the best tool for the job.

Don Branson
+16  A: 

It really depends on what you want to do with the XML document and how big the documents are.

Roughly, you can categorise XML APIs as:

  • DOM APIs - load the entire document into memory, which limits the size of document you can process, but can then create optimised structures for navigation and transformation
  • Streaming APIs - your application must interpret low level parse events (e.g. start of element, end of element, etc.) but you are not limited by memory. There are two kinds of streaming API - push and pull. Push parsers fire parse events at an object you define and that object must keep track of the current parse state, with a state machine or stack, for example). Pull parsers let your app pull parse events from the parser. This makes it easy to write a recursive descent parser to process the XML content, but then stack size becomes a limit on the size of document you can process.
  • XML Mappers - map XML content to Java objects. There are two main approaches for XML mapping: code-gen or reflection. Code-gen mappers generate Java classes from an XML schema, which means you don't have to duplicate the schema structure in Java code but does have the disadvantage that your Java code exactly mirrors the schema structure. Also most code generators create NOJO classes that are awkward to work with and have no behaviour of their own. Reflective mappers let you to write Java classes with rich behaviour and then define how they are mapped to/from XML. If you need to conform to a predefined schema, you'll have to make sure your classes and mapping configuration are correct w.r.t. that schema.

Some options available are:

  • DOM APIs: The DOM APIs in the standard library are standard (obviously!) and so interoperate with other libraries but they are awful. There are several more convenient DOM-like APIs, such as XOM (my favourite for the same reasons that Adam Batkin gives above) or JDOM. Have a look at a few and decide which API you prefer.
  • Streaming APIs: the standard library contains an implementation of the SAX push parser. The standard pull parser for Java is StAX.
  • Mapping APIs: JAXB is a JSR standard but I prefer XStream because I can more easily separate the mapping configuration from the mapped classes (no need for annotations or XML configuration) and it maps objects to/from other data formats.
Nat
+1 for :it depends" -- and what it depends on.
David Moles
A: 

Whenever I needed to work with XML documents I always thought of dom4j/sax as a first resort, and it never lets me down. ;)

You should look into the SAXReader.

Rifk
A: 

@Epaga, if you don't put "best" in context, you'll fail miserably.

For example, attempting to load a huge XML in a DOM like structure would be very stupid. You have to select the tool wisely.

A: 

XOM (http://www.xom.nu) is a simple, flexible XML toolkit which I have found simpler and easier to use than many other parsers. Since switching from the standard W3C-based tools my productivity has increased considerably. In his web pages the author Elliotte Rusty Harold explains why XOM's design is the appropriate model for an XML DOM.

peter.murray.rust