tags:

views:

41

answers:

1

Have an extended Java app which does a lot of modifications including merging xml files. This is done using dom classes and seems to work fine. In the end I want to format the resultant xml so that it is more easily read and store as a String. Started out doing this with dom also, but it puts a limit on the size of the files I can format.
Current code is:

public String parseToString(Node node) {  
  Transformer transformer = null;  
  StringBuffer buffer = null;  
  try {  
    Transformer = TransformerFactory.newInstance().new Transformer();  
    --- set some OutputProperties ---  
    StringWriter stringWriter = new StringWriter(512);  
    transformer.transform(new DOMSource(node), new StreamResult(stringWriter));  
    buffer = stringWriter.getBuffer();  
    stringWriter.close();  
    --- catch phrases ---  
    return(buffer.toString());  
}    

My understanding is that to use SAX I need to replace "new DOMSource()" with "new StreamSource()", but to do this I need to convert the node (actually the complete document) to a string. What is the easiest way to do that without eating up more memory?

A: 

What you're actually doing when formatting your way is transforming your XML (a DOM Node) using a so called "identity" transformation (that's what you get from an empty Transformer()). What source type you specify (DOMSource or StreamSource) does not really matter as XSLT needs your XML in memory anyway (which means you end up building DOM anyway). It's just not possible to apply XSLT to the XML data streaming through as your XPath (in general) can wander whichever way it wants on the source tree. With SAX input you can't reach what you haven't looked at, and then what you've looked at you don't retain in memory or you become DOM.

You already have your XML in memory as a DOM Node. Identity transformation is one way to get the output stream out of it and there's not much you can do about how much memory it will consume (maybe try different transformer implementations?). I am also not sure what parser implementation you have underneath, but you can look if it has something like this - http://xerces.apache.org/xerces-j/apiDocs/org/apache/xml/serialize/XMLSerializer.html. This guy will simply travel down the elements tree and print them out. It should have no memory overhead as what it does is pretty brutal. And if you had a SAX input it would print it out the same way (that is without building an in-memory presentation ready for XSLT transformation). The only caveat of going this way is that it's a specific API, not part of JAXP.

Pavel Veller