views:

157

answers:

8

We work on an internal corporate system that has a web front-end as one of its interfaces.

The front-end (Java + Tomcat + Apache) communicates to the back-end (proprietary system written in a COBOL-like language) through SOAP web services.

As a result, we pass large XML files back and forth.

We believe that this architecture has a significant impact on performance due to the large overhead of XML transportation and parsing. Unfortunately, we are stuck with this architecture.

How can we make this XML set-up more efficient?

Any tips or techniques are greatly appreciated.

+2  A: 

You can compress the transfer if both ends can support that, and you can try different parsers, but since you say SOAP there aren't many choices. SOAP is bloated anyway.

fuzzy lollipop
+2  A: 

I'm going to go out on a limb here and suggest GZIP Compression if you think it is due to bandwidth issues. (you mentioned XML Transportation) Yes, this would increase your CPU time, but it might speed things up in the transport.

Here's the first Google hit on GZIP Compression as a starting point. It describes how it works on Apache.

Pretzel
+7  A: 

Profiling!

Do some proper profiling of your system under load - there isn't really enough information to go on here.

You need to work out where the time is going and what the bottleknecks are (network bandwidth, cpu, memory etc...). Only then will you know what to do about it - many optimisations are really just trade-offs (for example caching is sacrificing memory to improve performance elsewhere)

The only thing that I can think of off-hand is making sure that you are using HTTP compression with web services - XML can usually be compacted down to a fraction of its normal size, but again this will only help if you have CPU cycles to spare.

Kragen
+1 for the sensible approach. There is no much sense in optimizing when you have no idea where the processing time goes.
Tomalak
Usually you are surprised about where the time actually goes..
Thorbjørn Ravn Andersen
+1  A: 

First make sure that your parsing methods are efficient for large documents. StAX is a good one for parsing large documents.

Additionally, you can take a look at binary XML approaches. These provide more efficient transport but also attempt to aid in parsing.

Russell Leggett
none of your suggestions are compatiable with SOAP
fuzzy lollipop
SOAP is just XML. To say its not compatible is ridiculous. Axis2 uses StAX: http://ws.apache.org/axis2/Just do a Google search and you'll find other examples.
Russell Leggett
And here is an article on using StAX with SpringWS http://blog.redstream.nl/2008/06/14/using-stax-with-spring-ws/
Russell Leggett
serialization format actually has little to do with performancehttp://soa.sys-con.com/node/250512
vtd-xml-author
+1  A: 

Try StAX. It performs well and has a nice, concise syntax.

Drew Johnson
A: 

Look into binary XML or any related projects. They're still very young projects, but they could be helpfull for you, and at least good to know about their existance.

Robert de W
+1  A: 

Check if your application reads in the whole XML documents as a DOM tree. Those may get VERY big, and frequently you can do with a simple SAX event inspection or a SAX-based XSLT program (which can be compiled for fast processing).

This is very visible in a profiler like visualvm in the Sun Java 6 JDK

Thorbjørn Ravn Andersen
A: 

i know this is an MS article but it has some interesting info that may be of use.

Inside MSXML Performance

f00