views:

276

answers:

3

I'm trying to track the memory usage of a script that processes URLs. The basic idea is to check that there's a reasonable buffer before adding another URL to a cURL multi handler. I'm using a 'rolling cURL' concept that processes a URLs data as the multi handler is running. This means I can keep N connections active by adding a new URL from a pool each time an existing URL processes and is removed.

I've used memory_get_usage() with some positive results. Adding the real_usage flag helped (not really clear on the difference between 'system' memory and 'emalloc' memory, but system shows larger numbers). memory_get_usage() does ramp up as URLs are added then down as the URL set is depleted. However, I just exceeded the 32M limit with my last memory check being ~18M.

I poll the memory usage each time cURL multi signals a request has returned. Since multiple requests may return at the same time, there's a chance a bunch of URLs returned data at the same time and actually jumped the memory usage that 14M. However, if memory_get_usage() is accurate, I guess that's what's happening.

[Update: Should have run more tests before asking I guess, increased php's memory limit (but left the 'safe' amount the same in the script) and the memory usage as reported did jump from below my self imposed limit of 25M to over 32M. Then, as expected slowly ramped down as URLs where not added. But I'll leave the question up: Is this the right way to do this?]

Can I trust memory_get_usage() in this way? Are there better alternative methods for getting memory usage (I've seen some scripts parse the output of shell commands)?

A: 

Well I have never really had a memory problem with my PHP scripts so I do not think I could be of much help finding the cause of the problem but what I can recomend is that you get a PHP accelerator, you will notice a serious performance increase and memory usage with decline. Here is a list of accelerators and an article comparing a few of them (3x better performance with any of them)

Wikipedia List

Benchmark

The benchmarks are 2 years old but you get the idea of the performance increases.

If you have to you can also increase you memory limit in PHP if you are still having problems even with the accelerator. Open up your php.ini and find:

memory_limit = 32M;

and just increase it a little.

Dr Hydralisk
Yeah, that's all stuff for the future on this project, right now is just trying to keep the scrip in line as much as possible. The problem is I want it to use the memory it's given.
Tim Lytle
+1  A: 

I also assume memory_get_usage() is safe but I guess you can compare both methods and decide for yourself, here is a function that parses the system calls:

function Memory_Usage($decimals = 2)
{
    $result = 0;

    if (function_exists('memory_get_usage'))
    {
        $result = memory_get_usage() / 1024;
    }

    else
    {
        if (function_exists('exec'))
        {
            $output = array();

            if (substr(strtoupper(PHP_OS), 0, 3) == 'WIN')
            {
                exec('tasklist /FI "PID eq ' . getmypid() . '" /FO LIST', $output);

                $result = preg_replace('/[\D]/', '', $output[5]);
            }

            else
            {
                exec('ps -eo%mem,rss,pid | grep ' . getmypid(), $output);

                $output = explode('  ', $output[0]);

                $result = $output[1];
            }
        }
    }

    return number_format(intval($result) / 1024, $decimals, '.', '');
}
Alix Axel
+2  A: 

real_usage works this way:

Zend's memory manager does not use system malloc for every blocks it needs. Instead, it allocates big block of system memory (in increments of 256K, can be changed by setting environment variable ZEND_MM_SEG_SIZE) and manages it internally. So, there are two kinds of memory usage:

  1. How much memory the engine took from the OS ("real usage")
  2. How much of this memory was actually used by the application ("internal usage")

Which one of these is returned by memory_get_usage(). Which one is more useful for you depends on what you are looking into. If you're looking into optimizing your in specific parts, "internal" might be more useful for you, if you tracking the usage globally, "real" would be of more use. memory_limit limits the "real" number, so as soon as all blocks that are permitted by the limit are taken from the system, and the memory manager can't allocate requested block there the allocation fails. Note that "internal" usage in this case might be less than the limit, but the allocation still could fail because of the fragmentation.

Also, if you are using some external memory tracking tool, you can set environment variable USE_ZEND_ALLOC=0 which would disable the above mechanism and make the engine always use malloc(). This would have much worse performance but allows you to use malloc-tracking tools.

See also an article about this memory manager, it has some code examples too.

StasM