tags:

views:

89

answers:

3

I want to use PHP to get the URL of the page to which the following address redirects:

http://peacecorpsjournals.com/journal/6731

The script should return the following URL to which the URL above redirects:

http://ghanakimsuri.blogspot.com/

A: 

One way (of many) to do this is to open the URL with fopen, then use stream_get_meta_data to grab the headers. This is a quick snippet I grabbed from something I wrote a while back:

  $fh = fopen($uri, 'r');
  $details = stream_get_meta_data($fh);

  foreach ($details['wrapper_data'] as $line) {
   if (preg_match('/^Location: (.*?)$/i', $line, $m)) {
     // There was a redirect to $m[1]
   }
  }

Note you can have multiple redirections, and they can be relative as well as absolute.

Chris Smith
Chris, I like this! I know it's a lazy response, but my only problem is that it returns two URL's crammed into a single array element:(http://peacecorpsjournals.com/gotojournal.php?6731http://ghanakimsuri.blogspot.com)This is because this address happens to redirect twice. Can the loop be changed in order to pull these two URL's apart?
Keyslinger
@Keyslinger Odd - the two URLs should be on separate lines in the wrapper data, so the if condition should execute twice, with one URL each time. I have test cases with multiple redirects which worked with this code last time I checked.
Chris Smith
@Keyslinger I just ran the code above with your URL and `echo $m[1], "\n";` in the if statement and it prints both URLs separately - are you sure you're just not accidentally cramming them together yourself? :)
Chris Smith
That must be it! Many thanks!
Keyslinger
+1  A: 

You can do this using cURL.

<?php

function get_web_page( $url ) 
{ 
    $options = array( 
        CURLOPT_RETURNTRANSFER => true,     // return web page 
        CURLOPT_HEADER         => true,    // return headers 
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects 
        CURLOPT_ENCODING       => "",       // handle all encodings 
        CURLOPT_USERAGENT      => "spider", // who am i 
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect 
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect 
        CURLOPT_TIMEOUT        => 120,      // timeout on response 
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects 
    ); 

    $ch      = curl_init( $url ); 
    curl_setopt_array( $ch, $options ); 
    $content = curl_exec( $ch ); 
    $err     = curl_errno( $ch ); 
    $errmsg  = curl_error( $ch ); 
    $header  = curl_getinfo( $ch ); 
    curl_close( $ch ); 

    //$header['errno']   = $err; 
   // $header['errmsg']  = $errmsg; 
    //$header['content'] = $content; 
    print($header[0]); 
    return $header; 
}  
$thisurl = "http://www.example.com/redirectfrom";
$myUrlInfo = get_web_page( $thisurl ); 
echo $myUrlInfo["url"];

?>

Code found here: http://forums.devshed.com/php-development-5/curl-get-final-url-after-inital-url-redirects-544144.html

Oren
Wrikken
A: 

I've found this resource to be the most complete, thought-out approach and explanation. The code isn't the shortest snippit, but you'll end up being able to track multiple redirects with a couple lines like this:

$result = get_all_redirects('http://bit.ly/abc123');
print_r($result);
editor
This is a nice script, but it isn't useful in the particular situation described above, namely, it returns an intermediate URL, and not the final one I ask for in the question.
Keyslinger
It returns all intermediate URLs as well as the final.
editor