views:

959

answers:

5

I haven't found many ways to increase the performance of a Java application that does intensive XML processing other than to leverage hardware such as Tarari or Datapower. Does anyone know of any open source ways to accelerate XML parsing?

+6  A: 

Take a look at Stax (streaming) parsers. See the sun reference manual. One of the implementations is the woodstox project.

Kees de Kooter
http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html has a good overview of XML parser speeds. Woodstox looks pretty good.
Sam Barnum
STAX is the way to go and Woodstox is supah fast.
casey
Stax is way slower than VTD-xml
vtd-xml-author
Wrt VTD vs Stax, one really should try it out. Stax is an API, so different implementations have different performance. And VTD-XML' trade-offs are bit different -- faster to parse, slower to access (some operations are only taken on access, like handling of character entities).
StaxMan
A: 

Piccolo claims to be pretty fast. Can't say I've used it myself though. You might also try JDOM. As ever, benchmark with representative data of your real load.

It partly depends on what you're trying to do. Do you need to pull the whole document into memory, or can you operate in a streaming manner? Different approaches have different trade-offs and are better for different situations.

Jon Skeet
Piccolo seems to trade speed for correctness, which may or may not be what you want. (http://cafeconleche.org/SAXTest/paper.html#S4.2.4)
Peter Štibraný
In all fairness, deviations are rather unlikely to affect cases where performance matters (which tend to be simple(r) use cases) -- SAXTest tends to focus on complicated cases of DTD usage and correctness.But on the other hand, while Piccolo may have been faster in 2004, it hasn't been developed much, and others have caught up, and some surpasses it (Xerces is as fast, Woodstox and especially Aalto faster)
StaxMan
A: 

Depending on the complexity of your XML messages you might find a custom parser can be 10x faster (though more work to write) However if performance is critical, I wouldn't suggest using a generic parser. (Also I wouldn't suggest using XML as its not designed for performance, but that's another story, .. ;)

Peter Lawrey
Writing custom XML parser is time consuming and error prone process. Getting XML right isn't easy, especially if you want to parse XML documents from the wild. (http://cafeconleche.org/SAXTest/)
Peter Štibraný
This is all true, which is why its not a good idea most of the time. However if speed is critical you can get a 10x improvement.
Peter Lawrey
Huh? Have you ever actually tried doing this? Writing a custom parser that is ANY faster is non-trivial. Fastest existing parsers parse with 30-60 MBps rate; not much slower than you can decode plain UTF-8 text.10x, no way, absolutely not. Feel free to try, get some numbers. :-)
StaxMan
I would strongly expect that any gain on speed comes with the knowledge that _some_ work does not need doing.
Thorbjørn Ravn Andersen
@Thorbjørn Custom parsers gain by being tuned to a specific XMl format. This is not appriate in most cases, but you can see a significant improvement. By improvement, I mean in term of latency rather than throughput. The throughput is improved by may be 2x, again by doing less work.
Peter Lawrey
Yes, I agree in that schema-specific processing can improve performance, and maybe factor of 2x is doable. Still it's good idea to start with existing general purpose ones, see how far they go. My bet would be on Aalto being fastest one can find currently.
StaxMan
A: 

Check Javolution as well

ykaganovich
I disagree. Javolution's XML "parser" does not check ANY problems with xml (duplicate attributes), doesn't handle namespaces, doesn't implement any standard API. And is not even faster.
StaxMan
+2  A: 

VTD-XML is very fast.

It has a DOM-like API and even XPath queries.

Vincent Robert