tags:

views:

48

answers:

2

i want to get several pages thru curl_exec, first page is come normally, but all others - 302 header, what reason?

$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, ROOT_URL);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$content = curl_exec($curl); // here good content
curl_close($curl);

preg_match_all('/href="(\/users\/[^"]+)"[^>]+>\s*/i', $content, $p);

for ($j=0; $j<count($p[1]); $j++){
    $new_curl = curl_init();
    curl_setopt($new_curl, CURLOPT_URL, NEW_URL.$p[1][$j]);
    curl_setopt($new_curl, CURLOPT_RETURNTRANSFER, 0);
    $content = curl_exec($new_curl); // here 302    
    curl_close($new_curl);

preg_match('/[^@]+@[^"]+/i', $content, $p2);

}

smth like this

+1  A: 

You probably want to provide a sample of your code so we can see if you're omitting something.

302 response code typically indicates that the server is redirecting you to a different location (found in the Location response header). Depending on what flags you use, CURL can either retrieve that automatically or you can watch for the 302 response and retrieve it yourself.

Here is how you would get CURL to follow the redirects (where $ch is the handle to your curl connection):

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects 
MightyE
You might also be interested in CURLOPT_MAXREDIRS and CURLOPT_AUTOREFERER
VolkerK
i can get page contents (without redirects) if i'll make one curl_exec call in my script, so problem not in redirection. Perhaps it is site (whih i'd like to parse) restriction...
hippout
@hippout: "so problem not in redirection" - 302/303 is quite unambiguous. Yes, maybe it is a restriction of the remote site. Maybe you're not allowed to fetch more than one document at a time. Just let curl follow the redirection and see _this_ document then contains an error message.
VolkerK
A: 

You can use curl multi which is faster and can get data from all the url's in parallel. You can use it like this

//Initialize
$curlOptions = array(CURLOPT_RETURNTRANSFER => 1);//Add whatever u additionally want.
$curlHandl1 = curl_init($url1);
curl_setopt_array($curlHandl1, $curlOptions);

$curlHandl2 = curl_init($url2);
curl_setopt_array($curlHandl2, $curlOptions);

$multi = curl_multi_init();
curl_multi_add_handle($multi, $curlHandle1);
curl_multi_add_handle($multi, $curlHandle2);

//Run Handles
$running = null;
do {
  $status = curl_multi_exec($mh, $running);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);

while ($running && $status == CURLM_OK) {
  if (curl_multi_select($mh) != -1) {
    do {
      $status = curl_multi_exec($mh, $running);
    } while ($status == CURLM_CALL_MULTI_PERFORM);
  }
}

//Retrieve Results
$response1 = curl_multi_getcontent($curlHandle1);
$status1 = curl_getinfo($curlHandle1);

$response1 = curl_multi_getcontent($curlHandle1);
$status1 = curl_getinfo($curlHandle1);

You can find more information here http://www.php.net/manual/en/function.curl-multi-exec.php Checkout the Example1

Jithin