I'm using a 'rolling' cURL multi implementation (like this SO post, based on this cURL code). It works fine to process thousands of URLs using up to 100 requests at the same time, with 5 instances of the script running as daemons (yeah, I know, this should be written in C or something).
Here's the problem: after processing ~200,000 urls (across the 5 instances) curl_multi_exec()
seems to break for all instances of the script. I've tried shutting down the scripts, then restarting, and the same thing happens (not after 200,000 urls, but right on restart), the script hangs calling curl_multi_exec()
.
I put the script into 'single' mode, processing one regular cURL handle at time, and that works fine (but it's not quite the speed I need). My logging leads me to suspect that it may have hit a patch of slow/problematic connections (since every so often it seems to process on URL then hang again), but that would mean my CURLOPT_TIMEOUT
is being ignored for the individual handles. Or maybe it's just something with running that many requests through cURL.
Anyone heard of anything like this?
Sample code (again based on this):
//some logging shows it hangs right here, only looping a time or two
//so the hang seems to be in the curl call
while(($execrun =
curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM);
//code to check for error or process whatever returned
I have CURLOPT_TIMEOUT
set to 120
, but in the cases where curl_multi_exec()
finally returns some data, it's after 10 minutes of waiting.
I have a bunch of testing/checking yet to do, but thought maybe this might ring a bell with someone.