views:

258

answers:

3

Hello all,

I have the function below which I call very frequently in a loop.

I waited 5 minutes as the memory climbed up from 1MB to 156MB. Should't PHP's garabage collector turn up and reduce this at some point?!

Is it because I have set memory limit at 256MB?

At echo point 2,3,4 its pretty constant memory usage. It goes down my half a meg at point 4. But point 1 is where the main memory increase happens. Probably because of file_get_html loading the html file in memory.

I though the clear and unset of the variable $html would take care of this?

function get_stuff($link, $category ){

    $html = file_get_html(trim("$link"));

    $article = $html->find('div[class=searchresultsWidget]', 0);

    echo '1 - > '.convert(memory_get_usage(true)).'<br />';  

    foreach($article->find('h4 a') as $link){

        $next_url = 'http://new.mysite.com'.$link-&gt;href;

        $font_name = trim($link->plaintext);        

        $html = file_get_html(trim("$next_url"));

        $article = $html->find('form[class=addtags]', 0);

        $font_tags = '';

        foreach($article->find('ul[class=everyone_tags] li a span') as $link){

            $font_tags .= trim($link->innertext).',';   

        }

        echo '2 - > '.convert(memory_get_usage(true)).'<br />'; 

        $font_name = mysql_real_escape_string($font_name);
        $category =  mysql_real_escape_string($category);  
        $font_tags = mysql_real_escape_string($font_tags);  

        $sql = "INSERT INTO tag_data (font_name, category, tags) VALUES ('$font_name', '$category', '$font_tags')";

        unset($font_tags);
        unset($font_name);
        unset($category); 

        $html->clear();   

        mysql_query($sql); 

        unset($sql);   

        echo '3 - > '.convert(memory_get_usage(true)).'<br />';    

} 

    unset($next_url);
    unset($link);
    $html->clear(); 
    unset($html);   
    unset($article);

    echo '4 - > '.convert(memory_get_usage(true)).'<br />';

}

As you can see, I attempted to make use of unset feebly. Although its no good as I understand it won't "unset" memory as soon as I call it.

Thanks all for any help on how I can reduce this upward rise of memory.

+3  A: 

The purpose of the garbage collector is solely to catch circular references.

If there are none, the variables are immediately eliminated once their reference count hits 0.

I don't recommend that you use unset, except in exceptional cases. Use functions instead and rely on the variables to go out of scope to have the memory reclaimed.

Other than that, we can't possible describe to you what's exactly happing because we'd have to know exactly what the simple DOM parser is doing. Possibly there are circular references or global resources holding a reference, but it would be difficult to know.

See reference counting basics and collecting cycles.

Artefacto
Huh? Garbage collection is an alternative to reference-counting, and it has the noted advantage of not being fooled by circular references.
Steven Sudit
@Steven That doesn't mean both things aren't used in PHP.
Artefacto
It's certainly possible to use both, particularly when one system is interfacing with another. For example, a .NET app calling a COM object has GC for the former but reference counting for the latter and has to make the two cooperate. So what I'm asking is whether PHP uses one or the other or both (and if so, when)?
Steven Sudit
@Steven See the comment I mad above in response to seand.
Artefacto
@Seteven @Artefacto In PHP you only have reference counting.
mathk
+6  A: 

There's a known memory leak with file_get_html(): http://simplehtmldom.sourceforge.net/manual_faq.htm#memory_leak

The solution is to use

$html->clear();

Which you are doing, BUT: You're using $html both inside and outside of the loop. Inside the loop you are calling $html->clear(), and then near the end of your function $html->clear() again (I assume to catch your initial file_get_html() object reference). That last call doesn't do anything. You're leaking memory with the initial $html = file_get_html() call.

Try using a different variable ($html1, maybe?) inside your loop and see what happens.

jasonbar
Good suggestion, I am trying that when this test I have running ends. Also do you think running this PHP script from the command line will make any difference? This is just out of interest.
Abs
"You're losing the memory associate with the initial $html = file_get_html()." Hum? What does this mean?
Artefacto
It was a good suggestion. It managed to decrease the rate of memory increase significantly! It has not gone past 10MB yet at this point. Before it would of been at 120mb or so! I think this is working nicely thanks Jasonbar! :)
Abs
@Artefacto: I means that his initial `$html = file_get_html()` is what is leaking memory. I've updated the answer to fix that statement.
jasonbar
@jason Thanks, I got it now.
Artefacto
@Abs, glad I could help. Please consider marking this as the accepted solution.
jasonbar
It is the accepted solution now. I was too busy watching the echos of the memory usage! :)
Abs
+1  A: 

PHP didn't have a proper garbage collector until 5.3. It basically used only reference counting, which would leave circular references in place until the script terminated (e.g. $a =& $a is circular). As well, the cleanup code it DID have would only run if memory pressure required it to. e.g. no point in doing an expensive cleanup cycle if the newly freed memory wasn't needed.

As of 5.3, there's a proper garbage collector, and you can force it to run with gc_enable() and gc_collect_cycles().

Marc B