tags:

views:

23

answers:

1

Hi,

I'm dealing with large XML files (several megabytes) for which I have to make various kind of checks. However I have problem with memory and time usage which grows very quickly. I've tested it like this:

$xml = new SimpleXMLElement($string);
$sum_of_elements = (double)0.0;

foreach ( $xml->xpath('//Amt') as $amt ) {
  $sum_of_elements += (double)$amt;
}

With microtime() and memory_get_usage() -funtions I get the following results by running this code:

  • 5Mb file (7480 Amt-elements):
    • execution time 0,69s
    • Memory usage grows from 10.25Mb to 29.75Mb

That's still quite ok. But then with a bit bigger file memory and time usage grow very much:

  • 6Mb file (8976 Amt-elements):
    • execution time 8,53s
    • Memory usage grows from 10.25Mb to 99.25Mb

The problem seems to be in looping the result set. I've also tried for-loop instead of foreach but with no difference. Without looping the memory usage does not grow so much.

Any idea where the problem could be?

+1  A: 

SimpleXML is tree-based and will load the entire document into memory. Using unset to mark no longer needed resources for PHP's GC for cleanup during a loop might yield less memory usage. If that doesnt solve the issue, consider using XMLReader for a pull-based approach. Though you won't be able to use XPath, memory consumption should be significantly lower.

Gordon
I tried unsetting $amt in the loop, but it doesn't help. What really is the strange thing is the big jump between 5mb and 6mb file, 5mb file takes only about 10Mb more memory than 1Mb file but as you can see, 6Mb file takes already 60Mb more than 5Mb file.
JPH
And like I said, the looping seems to be the problem. If I only save the results without going them trough ($result = $xml->xpath('//Amt')) both files seem to take roughly the same amount of memory.
JPH
@JPH well, you could use XDebug or Zend Server's Memory Tracing facility to see where it eats your memory. You could also try DOM instead of SimpleXml to rule out it's a memory leak within SimpleXml. Are you actually sure there is not any errors thrown during the loop? Do you have error_reporting enabled?
Gordon