tags:

views:

114

answers:

4

I have a do/while loop that goes over database rows. Because it runs many days at the time processing 100000s of rows, memory consumption is important to keep in check or it will crash. Right now every iteration adds about 4kb to script's memory usage. I'm using memory_get_usage() to monitor the usage.

I unset every variable used in the loop first thing in each iteration so I really don't know what else I could do. My guess is that do/while gathers some data with each iteration and this is what consumes the 4kb of memory. I know 4kb doesn't sound like much but it soon starts to add up when you have 100000s of iterations.

Can somebody suggest another way of going through large amount of database rows or how to somehow eliminate this "memory leak"?

edit Here's the UPDATED loop code. Above it is just few require_once()s.

$URLs = new URLs_url(db());
$c = new Curl;
$c->headers = 1;
$c->timeout = 60;
$c->getinfo = true;
$c->follow = 0;
$c->save_cookies = false;

do {
    // Get url that hasn't been checked for a week
    $urls = null;

    // Check week old
    $urls = $URLs->all($where)->limit(10);

    foreach($urls as $url) {
        #echo date("d/m/Y h:i").' | Checking '.$url->url.' | db http_code: '.$url->http_code;

        // Get http code    
        $c->url = $url->url;
        $data = $c->get();

        #echo ' - new http_code: '.$data['http_code'];

        // Save info
        $url->http_code = $data['http_code'];
        $url->lastchecked = time();
        $URLs->save($url);
        $url = null;
        #unset($c);
        $data = null;
        #echo "\n".memory_get_usage().' | ';
        echo "\nInner loop memory usage: ".memory_get_usage();
    }
    echo "\nOuter loop memory usage: ".memory_get_usage();

} while($urls);

Some logs how memory consumption behaves in both loops:

Inner loop memory usage: 611080
Inner loop memory usage: 612452
Inner loop memory usage: 613788
Inner loop memory usage: 615124
Inner loop memory usage: 616460
Inner loop memory usage: 617796
Inner loop memory usage: 619132
Inner loop memory usage: 620500
Inner loop memory usage: 621836
Inner loop memory usage: 623172
Outer loop memory usage: 545240
Inner loop memory usage: 630680
Inner loop memory usage: 632016
Inner loop memory usage: 633352
Inner loop memory usage: 634688
Inner loop memory usage: 636088
Inner loop memory usage: 637424
Inner loop memory usage: 638760
Inner loop memory usage: 640096
Inner loop memory usage: 641432
Inner loop memory usage: 642768
Outer loop memory usage: 556392
Inner loop memory usage: 640416
Inner loop memory usage: 641752
Inner loop memory usage: 643088
Inner loop memory usage: 644424
Inner loop memory usage: 645760
Inner loop memory usage: 647096
Inner loop memory usage: 648432
Inner loop memory usage: 649768
Inner loop memory usage: 651104
Inner loop memory usage: 652568
Outer loop memory usage: 567608
Inner loop memory usage: 645924
Inner loop memory usage: 647260
Inner loop memory usage: 648596
Inner loop memory usage: 649932
Inner loop memory usage: 651268
Inner loop memory usage: 652604
Inner loop memory usage: 653940
Inner loop memory usage: 655276
Inner loop memory usage: 656624
Inner loop memory usage: 657960
Outer loop memory usage: 578732
A: 

I think your core problem is that you're only clearing things in the outer loop.

$c = new Curl for instance is going to allocate memory to the heap for each iteration of the inner loop, but you're only unseting the last instance. I'd unset any stuff you can ($c, $data) at the end of the inner loop.

Mark E
By putting unsets in the end, I have managed to cut down memory leak from 4kb to something like ~1.5kb. I wonder what's still leaking :/
James D52
@James can you update your question to match your new code?
Mark E
@James, move the call to `memory_get_usage()` to be after `unset($URLs)`, I have a feeling you'll find that the 1.5kb growth is related to the `$URLs->save()` call.
Mark E
Didn't do anything. Memory consumption still grows by ~1.5kb with every iteration.
James D52
@James, what does `db()` do? and when you say iteration do you mean inner loop or outerloop iteration?
Mark E
db() makes connection to MySQL. I'm using phpDataMapper class. Awesome class by the way :) $URLs = new URLs_url(db()); has now being moved out of both loops so it only runs once.
James D52
@James, can you show us what the memory usage looks like over each iteration of the *outer* loop?
Mark E
Updated the post with some logs.
James D52
@James, how big are the pages you're bringing down?
Mark E
Not big, normal every day webpages. I checked if Curl class stores something like headers somewhere but nope. I had already written unsets and nulls in that class.
James D52
+2  A: 

This bit should probably happen only once, before the loop:

$c = new Curl;
$c->headers = 1;
$c->timeout = 60;
...
$c->getinfo = true;
$c->follow = 0;
$c->save_cookies = false;

Edit: Oh, the entire thing is wrapped in a do/while loop. /facepalm

Edit 2: There's also this important bit:

unset($class_object) does not release resources allocated by the object. If used in loops, which create and destroy objects, that might easily lead to a resource problem. Explicitly call the destructor to circumvent the problem.

http://www.php.net/manual/en/function.unset.php#98692

Edit 3:

What is this? Can't this be moved outside of the loop somehow?

$URLs = new URLs_url(db());

Edit 4:

Try removing these lines, for now.

    $url->http_code = $data['http_code'];
    $url->lastchecked = time();
    $URLs->save($url);
George Marian
It is a do/while loop. Foreach loop is there just to batch things. Also I can't really do Curl before loop because every iteration of foreach checks 10 different urls.
James D52
@James D52 But you can configure curl before and not create a new instance on every iteration. Do these values change between iteration? No, it doesn't seem like they do. The URL varies, as will the response from that URL. But, that doesn't necessarily need a new curl instance for each iteration.
George Marian
James D52
@James D52 How are you determining that you're "leaking" that memory? Edit: also, if you care about memory leaks (and presumably performance), why are you using double quoted strings when you don't need variable interpolation?
George Marian
Well "leaking" was just a word I used. Somewhere it's reserving ~1.5kb for something that's unnecessary. So you could say that my code is "leaking" :)I'll write some __destruct methods to see if that helps.
James D52
@James D52 Try single quotes for your strings, instead of double quotes.
George Marian
Changed everything to single quotes. Didn't do anything.
James D52
$URLs = new URLs_url(db()); was originally outside the loop but I wanted to try if it made any difference to unset/null it in every iteration. It doesn't seem to matter whether it's in or out of the loop.
James D52
@James D52 Are you expecting no increase in memory usage, since you're unsetting variables?
George Marian
Yes. Goal would be to have a script that resets itself in every iteration so it could be run indefinitely if needed. There's nothing that the script should remember from previous iteration so there's no reason for it to use more memory.
James D52
@James D52 This is a pure shot in the dark, but about that call to date()? Try dropping it all together. (I assume you're only using that for debugging purposes.)
George Marian
Check out the logs I put above. There's no date() there and still keeps growing. I wish it could have been the date() function :/
James D52
@James D52 LOL, yah me, too. Update the code again, please.
George Marian
Code updated to match current.
James D52
@James D52 At this point, I would try to simply it to very bare minimum, even if that means you're not saving the info you need to save. I'll post the three lines that I would remove for this test in my answer.
George Marian
That actually worked, kinda. Now the memory consumption is static in foreach loop but increases about 1.5kb in every do loop iteration. This means that phpDataMapper is what causes the memory consumption. I'll check into it and see if I can find a way to make it work.
James D52
A: 

The problem is probably

$c = new Curl

Is it possible to instantiate Curl once outside the loop, and then inside keep reusing the same instance. You could reset all fields to null in the loop if you wanted.

I had a similar problem. Unset didn't work - it turned out the garbage collection was rubbish. When I reused objects, it was fine (well, it broke for different reasons so I ended up reimplementing in Java).

Michael Jones
@Michale Jones See this comment about unset() in the PHP manual: http://www.php.net/manual/en/function.unset.php#98692
George Marian
@George Marian: Thanks. I didn't know that.
Michael Jones
A: 

This may or may not help you, but way back when in 2000, I had a client who had really slow internet and wanted to do all his website cms updates locally and update to live when done. Back then on IIS on win xp, I could not find a way of increasing script timeout from 60 seconds, and would generally need a good 2 minutes to do the update, so it would obviously time out.

To solve this, I would have the script update a set number of rows which were guaranteed to safely execute in under a minute, then call itself with a parameter of where to continue from, and so on until all rows were updated. Maybe you could try something similar for your situation?

Maybe run it for a set amount of time before calling itself, or in your case, maybe check memory, and redirect when usage gets too high?

I used something like this:

Top of script:

$started = microtime(true);

Then this in your loop:

if((microtime(true)-$started) > ($seconds_to_redirect)) {
    //call script with parameter
}

This is all I can think of.

Bjorn