views:

84

answers:

2

I have a script that is transferring about 1.5 million rows (~400mb worth of data) from a table to another table (during this process, some data is converted, modified, and placed in the correct field). It's a simple script, it just recursively loads data, then places it in the new tables under the correct fields and formats. The scripts works by (as an example) pulling all of the users from the table then begins looping through the users, inserting them into the new table, then pulling all of the posts from that user, looping through and inserting them into the correct table, then pulling all of the comments from a post and inserting those, then jumping back up and pulling all of the contacts for that user, finally onto the next user where it goes through the same process.

I'm just having a problem with the immense amount of data being transferred, because it is so large and there isn't any sort of memory management besides garbage collection (that I know of) in PHP, I'm unable to complete the script (it gets through about 15,000 connections and items transferred before it maxes out at 200MB of memory).

This is a one time thing, so I'm doing it on my local computer, not an actual server.

Since unset() does not actually free up memory, is there any other way to free up the data in a variable? One thing I attempted to do was overwrite the variable to a NULL value, but that didn't seem to help.

Any advice would be awesome, because man, this stinks.

+4  A: 

If you're actually doing this recursively then that's your problem - you should be doing it iteratively. Recursive processing leaves overhead (+garbage) every time the next call is made - so eventually you hit the limit. An iterative approach doesn't have such problems, and should be actively garbage collecting.

You're also talking about a mind numbing number of connections - why are there so many? I guess I don't completely understand your process, and why this approach is what's needed rather than one retrieve connection and one store connection. Even if you were - say - reconnecting on for each row, you should look at using persistent connections which allows the second connection to the same db to reuse the last connection. Persistent connections aren't a great idea for a web app with multi users (for scalability reasons) but in your very targeted case they should be fine.

Rudu
Sounds about right. But he shouldn't need persistent connections, he should just open one for each datasource once, and reuse them in his script.
timdev
You just blew my mind. I completely forgot about the ability to use a single store query with mySQL and being able to recycling connections.
Dan
Wow, it's not every day I blow people's minds. *puts on superhero cape*
Rudu
@timdev - agreed, but it's still a valid alternative if it's unavoidable for some strange reason.
Rudu
A: 

unset() does free up memory, but only if the object you're unsetting has no other references pointing to it. Since PHP uses reference counting rather than 'real' GC, this can bite you if you have circular references somewhere - a typical culprit is inside an ORM, where you often have a Database object that holds references to some Table objects, and each Table object has a reference back to the Database. Even if no outside reference exists to either object, they both still reference each other, preventing the reference count from hitting zero.

Also, are both tables on the same database? If so, all you need might be a simple INSERT ... SELECT query, mapping columns and doing a bit of conversion on the fly (although the processing you need to perform might not be possible or feasible in SQL).

Other than that, you don't need that many connections. Just open one for the reader, one for the writer; prepare a statement on the writer, execute the reader query, fetch one row at a time (this is important: do not fetch them all at once) from the reader query, do the processing, stuff it in the prepared writer statement, rinse and repeat. PHP's memory usage should remain roughly constant after the first few rows.

tdammers