views:

307

answers:

4

We are using XSLT to generate reports of our data. The data is currently stored in Oracle as XML documents (not using the XML type, but normal CLOB). We select the right XML documents and create a single document:

<DATABASE>
   <XMLDOCUMENT> ... </XMLDOCUMENT>
   <XMLDOCUMENT> ... </XMLDOCUMENT>
   ...
</DATABASE>

In some cases, the complete XML document contains +100000 documents. This means that a huge XML document is loaded first into memory, causing all kinds of memory issues.

How can we prevent this from happening? We are using the XslCompiledTransform class in .NET 2.0.

I know that there are 2 forms of parsing XML documents: DOM and SAX. But as I understand this, the SAX way is not possible in combination with XSLT. The DOM parsing method forces us to load the entire thing into memory.

What are our options? Does it help to first write the complete document to disk? Does Oracle perform a better job on large XSLT transformations?

A: 

CLOB can be streamed as far as I know. Streaming that to local file system is one of the options, of course. But then you will hit the same problem as most XSLT engines do their operation on DOM. I would suggest to split the file into smaller chunks (XMLDCOUMENTs in your case). This can be done without XSLT, but just with some simple regular expression. And then run your XSLT transformation on each individual chunk. This will, of course, be slower than doing that all in memory, but will save you from memory problems if document is too large.

Superfilin
+2  A: 

There is a third XML processing model called VTD-XML that overcomes most of DOM's memory issue, and natively supports XPath that you should look... XSLT support of it is on the way...

vtd-xml-author
+3  A: 

Depending on what kinds of transformations you want to do, STX might be an alternative to XSLT:

Streaming Transformations for XML (STX) is a one-pass transformation language for XML documents. STX is intended as a high-speed, low memory consumption alternative to XSLT, using the W3C XQuery 1.0 and XPath 2.0 Data Model. Since STX does not require the construction of an in-memory tree, it is suitable for use in resource constrained scenarios.

Jukka Matilainen
+1  A: 

this may help. XMLMax xml editor can apply an xsl stylesheet to each fragment matching an xpath expression and write all the matching outputs to a single file, encapsulated in a user-specified root. It has no file size limitations. google xmlmax editor.

bill seacham