views:

143

answers:

5

All,

I have the below code for Transforming an XML Document using an XSLT. The problem is when the XML Document is around 12MB the C# runs out of memory. Is there a different way of doing the transform without consuming that much memory?

public string Transform(XPathDocument myXPathDoc, XslCompiledTransform myXslTrans)
    {
        try
        {
            var stm = new MemoryStream();
            myXslTrans.Transform(myXPathDoc, null, stm);
            var sr = new StreamReader(stm);
            return sr.ReadToEnd();
        }
        catch (Exception e)
        {
          //Log the Exception
        }
    }

Here is the stack trace:

at System.String.GetStringForStringBuilder(String value, Int32 startIndex, Int32       length, Int32 capacity)
at System.Text.StringBuilder.GetNewString(String currentString, Int32 requiredLength)   
at System.Text.StringBuilder.Append(Char[] value, Int32 startIndex, Int32 charCount)
at System.IO.StreamReader.ReadToEnd()
at Transform(XPathDocument myXPathDoc, XslCompiledTransform myXslTrans)
+2  A: 

The MemoryStream + ReadToEnd means you need 2 copies in memory at that point. You could optimize that to 1 copy by using a StringWriter object as target (replacing MemStream + Reader) and use the writer.ToString() when you're done.

But that would get you only up to 24 MB at best, still way too small. Something else must be going on.
Impossible to say what, maybe your XSLT is too complicated or inefficient.


var writer = new StringWriter();
//var stm = new MemoryStream();
myXslTrans.Transform(myXPathDoc, null, writer);
//var sr = new StreamReader(stm);
//return sr.ReadToEnd();
return writer.ToString();
Henk Holterman
I assume that the exception already happens earlier, i.e. in `myXslTrans.Transform`. But without a stack trace we can only guess.
0xA3
Added stack trace in the original post
koumides
@Henk Holterman: Could you please provide an example how to replace it?
koumides
A: 

The ReadToEnd() function loads the entire stream into memory. You are better off using an XmlReader to stream the document in chunks, and then run xslt against smaller fragments. You may also want to consider passing the document with XmlReader entirely and not use xslt which is less suited to streaming data and less scalable for large files.

TheCodeKing
+2  A: 

You need

stm.Position = 0

to reset the memory stream to the beginning before reading the contents with the StreamReader. Otherwise you are trying to read content from past the end of the stream.

Nick Jones
I actually had this one but didn't make any difference
koumides
A: 

It may or may not be related but you need to make sure you dispose your stream and reader objects. I have also added in the position = 0 that Nick Jones pointed out.

public string Transform(XPathDocument myXPathDoc, XslCompiledTransform myXslTrans)
{
    try
    {
        using (var stm = new MemoryStream())
        {
             myXslTrans.Transform(myXPathDoc, null, stm);
             stm.Position = 0;
             using (var sr = new StreamReader(stm))
             {
                 return sr.ReadToEnd();
             }
        }
    }
    catch (Exception e)
    {
        //Log the Exception
    }
}
Bronumski
While it is a good habit, a MemoryStream doesn't actually need Disposing.
Henk Holterman
Yes and no. As far as I understand it if any of the async methods have been called (BeginRead, BeginWrite) and have not finished you could leak event handles albeit unlikely. As you said it is good practice.
Bronumski
@Henk The point of implementing IDisposable is so that callers know to always dispose objects as soon as possible to release resources. IMO I don't think there's ever an argument for not doing this, or a reason not to. If you look at the implementation of MemoryStream.Dispose in reflector there are consequences to not doing this, however small. I would always consider not disposing a disposable object as a bug.
TheCodeKing
+1 Good point but I think Henk's comment was more in the context of the question pointing out disposing of the MemoryStream would have little impact in this instance.
Bronumski
+1  A: 

The first thing I would do is to isolate the problem. Take the whole MemoryStream business out of play and stream the output to a file, e.g.:

using (XmlReader xr = XmlReader.Create(new StreamReader("input.xml")))
using (XmlWriter xw = XmlWriter.Create(new StreamWriter("output.xml")))
{
   xslt.Transform(xr, xw);
}

If you still get an out-of-memory exception (I'd bet folding money that you will), that's a pretty fair indication that the problem's not with the size of the output but rather with something in the transform itself, e.g. something that recurses infinitely like:

<xsl:template match="foo">
   <bar>
      <xsl:apply-templates select="."/>
   </bar>
</xsl:template>
Robert Rossney