tags:

views:

497

answers:

3

I'm using Delphi to create an XML document from data in a relational database. It tests fine with small datasets, but when I try to expand the size of the data set to production levels it eventually bombs out with an EOutOfMemory exception during node creation.

I'm using a TXMLDocument dropped on a form (MSXML as the Vendor), and my code generally looks like this:

  DS := GetDS(Conn, 'SELECT Fields. . . FROM Table WHERE InsuredID = ' +IntToStr(AInsuredID));

  try

    while not DS.Eof do
      with ANode.AddChild('LE') do
      begin
        AddChild('LEProvider').Text := DS.FieldByName('LEProvider').AsString;
        // Need to handle "other" here
        AddChild('Date').Text       := DateToXMLDate(DS.FieldByName('LEDate').AsDateTime);
        AddChild('Pct50').Text      := DS.FieldByName('50Percent').AsString;
        AddChild('Pct80').Text      := DS.FieldByName('80Percent').AsString;
        AddChild('Actuarial').Text  := DS.FieldByName('CompleteActuarial').AsString;
        AddChild('Multiplier').Text := DS.FieldByName('MortalityMultiplier').AsString;
        DS.Next;
      end;

  finally

    DS.Free;

  end;

with this section, as well as numerous other similarly constructed sections applying to different database tables, executed many times. In this example ANode is an IXMLNode passed in to the function for use as a container.

I do not expect the resulting XML file on disk to be more than 10 megabytes. I assume that somehow I'm leaking memory in my creation and disposal of XMLNodes, but I'm not familiar enough with Interfaces to know how to track down my problem.

+2  A: 

TXMDocument is a DOM style interface and keeps the whole document in memory. Memory gets used up rather quick that way. Even when the resulting file is not that big. You don't really need TXMLDocument to write out a simple XML. Why not write directly to a file in xml format?

That being said: It could also be an error due to heap fragmentation or be a real memory leak. You might want to try a tool like mentioned here: http://stackoverflow.com/questions/291631/profiler-and-memory-analysis-tools-for-delphi

Lars Truijens
I agree with Lars - just use writeln() or write directly to a stream. It's much faster and uses less memory.
Lars D
Thx. Is that in-memory limitation in TXMLDocument or in MSXML? It doesn't seem like a reasonable limitation in the modern world. If at all possible I prefer to use something that understands the XML format. Also, I'll be validating the XML against an XSD as the final stage of the creation process.
Larry Lustig
Hmm. Two answer from Larses, and that was my nickname in college.
Larry Lustig
TXMLDocument is a wrapper around MSXML. All DOM APIs need lots of RAM. I is hard to say if your problem is a problem due to this fact or that it is another problem.
Lars Truijens
Lars: I don't mean to be (too) dense, but are you saying that all DOM-oriented XML libraries are limited to XML documents that will fit into memory, including overhead? Looks like I'm definitely headed towards your suggestion of straight text writes (with the added complexities). Unfortunately, I'm also writing the receiving side of this XML transaction. . .
Larry Lustig
Yes, that is correct. DOM APIs do offer a random access model and validation. Alternatives for reading are event driven SAX APIs (Also supported by MSXML) or sequential xml readers like .Net's XmlReader.
Lars Truijens
A: 

Each of those AddChild calls has its result stored into a temporary IXmlNode variable declared implicitly by the compiler. They should get cleaned up automatically when the current subroutine returns (whether normally or by an exception). You can make their lifetime more explicit by declaring your own variables.

var
  le, child: IXmlNode;
begin
  DS := GetDS(Conn, Format(Query, [AInsuredID]));
  try
    while not DS.Eof do begin
      le := ANode.AddChild('LE');
      child := le.AddChild('LEProvider');
      child.Text := DS.FieldByName('LEProvider').AsString;
      // Need to handle "other" here
      child := le.AddChild('Date');
      child.Text := DateToXMLDate(DS.FieldByName('LEDate').AsDateTime);
      child := le.AddChild('Pct50');
      child.Text := DS.FieldByName('50Percent').AsString;
      child := le.AddChild('Pct80');
      child.Text := DS.FieldByName('80Percent').AsString;
      child := le.AddChild('Actuarial');
      child.Text := DS.FieldByName('CompleteActuarial').AsString;
      child := le.AddChild('Multiplier');
      child.Text := DS.FieldByName('MortalityMultiplier').AsString;
      DS.Next;
    end;
  finally
    DS.Free;
  end;
end;

In the above code, there are no implicit interface variables. The compiler would have declared a new implicit variable for each AddNode call, but the code above demonstrates that only two were necessary because child can be reused for each new child node.

That code alone shouldn't cause an extreme amount of memory use, though. It seems more likely that you're keeping references to objects that you don't really need anymore, or you're creating circular references for some interface objects. The MSXML library shouldn't create any circular references of its own, but you haven't shown all the code that might be running here.

Rob Kennedy
Assuming that the memory attached to those implicit variables will be freed when they go out of scope (and it's not clear to me why that's at the end of the function rather than after the statement is finished processing) then I'd rather avoid the extra code. I did notice, in researching this issue before posting my question, someone suggesting that IXMLNodes may not, in fact, get freed once the last reference is given up by rather at some indeterminate time in the future. Is is possible that MSXML has some internal garbage collection that isn't keeping up with my use of nodes?
Larry Lustig
The compiler's implicit variables are the same as ordinary declared variables, except that they don't have names. They go out of scope at the end of the function, just like all other variables. That way, the compiler only has to insert one implicit "finally" block to clean up everything. Maybe you're conflating them with C++ temporaries, which live to the end of the current statement. But anyway, yes, MS XML may be holding things longer than you need them. Consider destroying the entire `TXmlDocument` object when you're finished with it, and then create a new one later if you need another.
Rob Kennedy
Thx Rob, unfortunately the EOutOfMemory occurs before I've constructed and saved my first complete document. Not conflating with C++, just clueless as to how these anonymous objects are freed.
Larry Lustig
A: 

Try using a SAX parser rather than DOM. DOM keeps a representation of the whole XML file in memory.

try here

Steve