tags:

views:

96

answers:

3

I have some very large XML files (800 MB to 1.5 GB). I need to apply XSLT on that. I am able to read it XMLTextReader. When i applied XSLT transformation, get SystemOutOfMemory Exception.

My code looks like;

    static void Main(string[] args)
    {
        XDocument newTree = new XDocument();
        XmlTextReader oReader = new XmlTextReader(@"C:\Projects\myxml.xml");


        using (XmlWriter writer = newTree.CreateWriter())
        {
            XslCompiledTransform oTransform = new XslCompiledTransform();
            oTransform.Load(@"C:\Projects\myXSLT.xsl");
            oTransform.Transform(oReader, writer);
        }
        Console.WriteLine(newTree);
    }

Can anyone help me here?

Thanks in advance. It is very urgent. If I don't get any solution, I need to split XML into smaller XML and do transformation.

A: 

we are facing a similar problem. The solution we came uo with was to not use xslt for this case, and instead use Linq to Xml transformations while stteaming the data. You can leverage the c# yield keyword to iterate through an xml stream and tackle the file piecemeal this way. See streaming with linq to xml

the nature of xslt requires the xml to be loaded into memory. what needs to occur is you need to break down the large file into more managable pieces. if you use the xml streaming technique, you can break the document up into sub elements which you can then individually apply the xslt to. you may have to rewrite the xslt to accomodate this behavior.

Aside from this, the only other option is to throw more hardware at it, but this might even require an operating system upgrade depending on RAM limitations...

E Rolnicki
not possible in my case. need to apply a big xslt. Is there any XML file splitter tool available?
jatin
A: 

Don't know if it helps much, but here is some code I use to transform large files:

   XPathDocument myXPathDoc = new XPathDocument("xmfile.xml");
   XslCompiledTransform myXslTrans = new XslCompiledTransform() ;
   XsltSettings st = new XsltSettings(true, true);
   myXslTrans.Load("StyleSheet.xslt", st, null);
   StreamWriter s =new StreamWriter("output-fie.xslt");

   XsltArgumentList ln = new XsltArgumentList();
   // some xslt argument processing stuff            
   myXslTrans.Transform(myXPathDoc, ln, s);

It can take a while but it does seem to get the job done.

glenatron
it fails in very first statement XPathDocument myXPathDoc = new XPathDocument("xmfile.xml");
jatin
Are you using System.Xml.XPath and System.Xml.Xsl ?
glenatron
+1  A: 

XSLT uses XPath and this requires that the whole XML document be maintained in memory. Thus the problem of insufficient memory is by definition.

There are simle rules to approximate how much memory is needed and one of them says 5 * text-size.

So, for a "typical 1.5GB XML file" 8GB RAM may be sufficient.

Either split the document into smaller parts or wait for an implementation of XSLT 2.1, which defines special streaming instructions. In the meantime one may use the latest (commercial) version of Saxon, which implements extensions for streaming and successful processing of 64GB document has been reported on twitter.

Dimitre Novatchev
+1, however, XSLT 2.x in the context of .NET is probably something we can dream about forever.
0xA3
@0xA3: Why not? There is Saxon.NET.
Dimitre Novatchev
Have you ever tried it? Saxon is great with Java, but terribly slow on .NET
0xA3
@0xA3: Yes, it works pretty well.Not terribly slow -- may be 1.80 times slower than the Java version. One vould make it even faster if he NGEN-s the Saxon.NET binaries.
Dimitre Novatchev
@Dimitre: I see. So that means, you can basically access any tag of the XML document from within any template in the XSLT? One consequence of this would be that you cannot split up the original XML document neither, except if you exactly know the semantics and where you're save to split it, because otherwise parts of the XML could be missing when a template tries to access a certain xpath.
chiccodoro