Here is a brief overview of what I am doing, it is quite simple really:
- Go out and fetch records from a database table.
- Walk through all those records and for each column that contains a URL go out (using cURL) and make sure the URL is still valid.
- For each record a column is updated with a current time stamp indicating when it was last checked and some other db processing takes place.
Anyhow all this works well and good and does exactly what it is supposed to. The problem is that I think performance could be greatly improved in terms of how I am validating the URL's with cURL.
Here is a brief (over simplified) excerpt from my code which demonstrates how cURL is being used:
$ch = curl_init();
while($dbo = pg_fetch_object($dbres))
{
// for each iteration set url to db record url
curl_setopt($ch, CURLOPT_URL, $dbo->url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_exec($ch); // perform a cURL session
$ihttp_code = intval(curl_getinfo($ch, CURLINFO_HTTP_CODE));
// do checks on $ihttp_code and update db
}
// do other stuff here
curl_close($ch);
As you can see I am just reusing the same cURL handle the entire time but even if I strip out all over the processing (database or otherwise) the script still takes incredibly long to run. Would changing any of the cURL options help improve performance? Tuning timeout values / etc? Any input would be appreciated.
Thank you,
- Nicholas