views:

157

answers:

1

Hi-- I've written a PHP script that runs via SSH and nohup, meant to process records from a database and do stuff with them (eg. process some images, update some rows).

It works fine with small loads, up to maybe 10k records. I have some larger datasets that process around 40k records (not a lot, I realize, but it adds up to a lot of work when each record requires the download and processing of up to 50 images).

The larger datasets can take days to process. Sometimes I'll see in my debug logs memory errors, which are clear enough-- but sometimes the script just appears to "die" or go zombie on me. My tail of the debug log just stops, with no error messages, the tail of the nohup log ends with no error, and the process is still showing in a ps list, looking like this--
26075 pts/0 S 745:01 /usr/bin/php ./import.php
but no work is getting done.

Can anyone give me some ideas on why a process would just quit? The obvious things (like a php script timeout and memory issues) are not a factor, as far as I can tell.

Thanks for any tips

PS-- this is hosted on a godaddy VDS (not my choice). I am sort of suspecting that godaddy has some kind of limits that might kick in on me despite what overrides I put in the code (such as set_time_limit(0);).

+2  A: 

Very likely the OOM killer. If you really , really really want to stay out of its reach, as root, have your process write -17 to /proc/self/oom_adj. Caution: The kernel usually knows better. Evading the OOM killer can actually cripple the same RDBMS that you are trying to query. What a vicious cycle that would be :)

You probably (instead) want to stagger queries based on what you read from /proc/loadavg and /proc/meminfo. If you increase loads or swap exponentially, you need to back off, especially as a background process :)

Additionally, monitor IOWAIT while you run. This can be averaged from /proc/stat when compared with the time the system booted. Note it when you start and as you progress.

Unfortunately, the serial killer known as the OOM killer does not maintain a body count that is accessible beyond parsing kernel messages.

Or, your cron job keeps hitting its ulimited amount of allocated heap. Either way, your job needs to back off when appropriate, or prevent its own demise (as noted above) prior to doing any work.

As a side note, you probably should not be doing what you are doing on shared hosting. If its that big, its time to get a VPS (at least) where you have some control over what process gets to do what.

Tim Post
There's also 'strace' to see if it's hanging on a syscall or something internal to php
Marc B
@Marc - Possibly, but not likely. 'Internal to PHP' would result in disk sleep or some kid of error that was passed along. Either is noticeable.
Tim Post
@Marc - or just a zombie process
Tim Post
thanks guys!Tim, I'm afraid this is a bit beyond me as a simple PHP hacker-- this is running on a virtual server, with only mysql and this script running. It's definitely not my choice, I developed it on a pair of Amazon EC2 machines (one functioning as server, one as mysql server) with no issues whatsoever!If possible, could you supply a bit more info about the methodology for using IOWAIT and loadavg to throttle a PHP script?Thanks
julio