views:

368

answers:

4

So I have a PHP page that allows users to download CSV for what could be a whole bunch of records. The problem is the more results the MySQL query returns, the more memory it uses. That's not really surprising, but it does pose a problem.

I tried using mysql_unbuffered_query() but that didn't make any difference, so I need some other way to free the memory used by what I assume are the previously processed rows. Is there a standard way to do this?

Here's a commented log that illustrates what I'm talking about:

// Method first called
2009-10-07 17:44:33 -04:00 --- info: used 3555064 bytes of memory

// Right before the query is executed
2009-10-07 17:44:33 -04:00 --- info: used 3556224 bytes of memory

// Immediately after query execution
2009-10-07 17:44:34 -04:00 --- info: used 3557336 bytes of memory

// Now we're processing the result set
2009-10-07 17:44:34 -04:00 --- info: Downloaded 1000 rows and used 3695664 bytes of memory
2009-10-07 17:44:35 -04:00 --- info: Downloaded 2000 rows and used 3870696 bytes of memory
2009-10-07 17:44:36 -04:00 --- info: Downloaded 3000 rows and used 4055784 bytes of memory
2009-10-07 17:44:37 -04:00 --- info: Downloaded 4000 rows and used 4251232 bytes of memory
2009-10-07 17:44:38 -04:00 --- info: Downloaded 5000 rows and used 4436544 bytes of memory
2009-10-07 17:44:39 -04:00 --- info: Downloaded 6000 rows and used 4621776 bytes of memory
2009-10-07 17:44:39 -04:00 --- info: Downloaded 7000 rows and used 4817192 bytes of memory
2009-10-07 17:44:40 -04:00 --- info: Downloaded 8000 rows and used 5012568 bytes of memory
2009-10-07 17:44:41 -04:00 --- info: Downloaded 9000 rows and used 5197872 bytes of memory
2009-10-07 17:44:42 -04:00 --- info: Downloaded 10000 rows and used 5393344 bytes of memory
2009-10-07 17:44:43 -04:00 --- info: Downloaded 11000 rows and used 5588736 bytes of memory
2009-10-07 17:44:43 -04:00 --- info: Downloaded 12000 rows and used 5753560 bytes of memory
2009-10-07 17:44:44 -04:00 --- info: Downloaded 13000 rows and used 5918304 bytes of memory
2009-10-07 17:44:45 -04:00 --- info: Downloaded 14000 rows and used 6103488 bytes of memory
2009-10-07 17:44:46 -04:00 --- info: Downloaded 15000 rows and used 6268256 bytes of memory
2009-10-07 17:44:46 -04:00 --- info: Downloaded 16000 rows and used 6443152 bytes of memory
2009-10-07 17:44:47 -04:00 --- info: used 6597552 bytes of memory

// This is after unsetting the variable. Didn't make a difference because garbage
// collection had not run
2009-10-07 17:44:47 -04:00 --- info: used 6598152 bytes of memory

I am hoping there is some sort of standard technique for dealing with large result sets like this (or even much larger), but my research hasn't turned up anything.

Ideas?

Here's some code, by request:

    $results = mysql_query($query);

    Kohana::log('info', "used " . memory_get_usage() . " bytes of memory");                

    $first = TRUE;
    $row_count = 0;

    while ($row = mysql_fetch_assoc($results)) {
        $row_count++;
        $new_row = $row;

        if (array_key_exists('user_id', $new_row)) {
            unset($new_row['user_id']);
        }

        if ($first) {
            $columns = array_keys($new_row);
            $columns = array_map(array('columns', "title"), $columns);
            echo implode(",", array_map(array('Reports_Controller', "_quotify"), $columns));
            echo "\n";
            $first = FALSE;
        }

        if (($row_count % 1000) == 0) {
            Kohana::log('info', "Downloaded $row_count rows and used " . memory_get_usage() . " bytes of memory");                
        }

        echo implode(",", array_map(array('Reports_Controller', "_quotify"), $new_row));
        echo "\n";
    }
A: 

Is this a "live" download? By that I mean are pushing this to the client when you're generating the CSV? If so, then there are some things you can do:

  1. Don't use output buffering. This saves everything in memory until you flush it explicitly or implicitly (by the script ending), which will use more memory;
  2. As you read rows from the database, write them to the client.

Other than that, we probably need to see some skeletal code.

cletus
I agree. flushflushflushflushflush
Justin Johnson
I have tried it letting PHP handle the buffering automatically and manually managing the buffering. (PHP does not buffer output by default, as far as I know.) Manually managing the buffer increased memory usage.
Rafe
A: 

Are you actually flushing data periodically? PHP's normal buffering can be pretty vicious for long-running code since there are multiple copies of data between the MySQL client, your variables and the output system. It's been a few years but I last recall using something like this in skeleton code:

ob_end_flush()
mysql_unbuffered_query()
while ($row = mysql_fetch…) {
   … do something …   

   flush(); // Push to Apache
   unset($row, … all other temporary variables …);
}
Chris Adams
Flushing the buffer and unsetting my temporary variable after each iteration doesn't affect memory usage.
Rafe
One other thing to try: have you tried putting the row processor into a function? Some searching suggested that PHP at least historically didn't always clear memory until a function return.
Chris Adams
+1  A: 

Some further profiling reveals that the issue is a memory leak somewhere. I stripped down the code to its simplest form and memory usage does not grow with each iteration. I suspect it's Kohana (the framework I'm using).

Rafe
A: 

Thanks for your question using mysql_unbuffered_query() solved my problem of running out of RAM with PHP and MYSQL working with a large dataset.

PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 32 bytes) in /content/apps/application_price.php on line 25

jono2010