views:

44

answers:

1

Hi everybody.

I have written a tool for database replication in PHP. It's working fine but there's one issue:

I'm using PDO to connect to the different databases to keep it independent of any specific RDBMS, which is crucial to this application.

The tool does some analysis on the tables to decide how to convert certain types and some other stuff. Then it pretty much does a "SELECT * FROM <tablename>" to get the rows that need to be copied. The result sets are fairly large (about 50k rows in some tables).

After that it iterates over the result set in a while loop with PDOStatement::fetch();, does some type conversion and escaping, builds an INSERT statement and feeds that to the target database.

All this is working nicely with one exception. While fetching the rows, one ata time, from the result set, the PHP process keeps eating up more and more memory. My assuption is, that PDO keeps the already processed rows in memory until the whole result set is processed.

I also abserved that, when my tool is finished with one table and proceeds to the next, memory consumption drops instantly, which supports my theory.

I'm NOT keeping the data in PHP variables! I hold just one single row at any given moment for processing, so that's not the problem.

Now to the question: Is there a way to force PDO not to keep all the data in memory? I only process one row at a time, so there's absolutely no need to keep all that garbage. I'd really like to use less memory on this thing.

+1  A: 

I believe the problem comes from php's garbage collector, as it does not garbage collect soon enough.
I would try to fetch my results in chunks of row_count size, like "SELCT ... LIMIT offset, row_count" in MySQL, or "SELECT * FROM (SELECT ...) WHERE ROW_NUM BETWEEN offset AND (offset + row_count)" in ORACLE.
Using Zend_Db_Select one can generate DB-independent queries:

$select = $db->select()
    ->from(array('t' => 'table_name'),
        array('column_1', 'column_2'))
    ->limit($row_count, $offset);
$select->__toString(); 
# on MySQL renders: SELECT column_1, column_2 FROM table_name AS t LIMIT 10, 20
clyfe
Maybe that could help, but that's just too DBMS specific. The tool has to work for multiple databases systems and any non-generic solution is just too prone to errors. I also think that dividing the queries into multiple requests will lower overall performance.
Techpriester
maybe use [Zend_Db_Select](http://framework.zend.com/manual/en/zend.db.select.html) and generate db-independent query
clyfe