ansaurus

Question

How to use PHP to get a webpage into a variable

Answer 1

+8 A:

Use CURL.

<?php
        // create curl resource
        $ch = curl_init();

        // set url
        curl_setopt($ch, CURLOPT_URL, "example.com");

        //return the transfer as a string
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);


        //change the UA to spoof IE7.
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)');

        // $output contains the output string
        $output = curl_exec($ch);

        // close curl resource to free up system resources
        curl_close($ch);     
?>

(From http://uk.php.net/manual/en/curl.examples-basic.php)

Rich Bradshaw 2009-03-28 15:37:36

Good! Still don't work though I need the script to tell the server that I'm using a browser

Omar Abid 2009-03-28 15:42:53

Oh, sorry - just add a curl_setopt for the UA - I've added it into my answer.

Rich Bradshaw 2009-03-28 18:06:24

Answer 2

A:

This answer takes your comment to Rich's answer in mind.

The site is probably checking whether or not you are a real user using the HTTP referer or the User Agent string. try setting these for your curl:

 //pretend you came from their site already
curl_setopt($ch, CURLOPT_REFERER, 'http://domainofthesite.com');
 //pretend you are firefox 3.06 running on windows Vista
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6');

Pim Jager 2009-03-28 15:59:38

Answer 3

A:

Another way to do it (though others have pointed out a better way), is to use PHP's fopen() function, like so:

$handle = fopen("http://www.example.com/", "r");//open specified URL for reading

It's especially useful if cURL isn't available.

karim79 2009-03-28 16:17:08

Answer 4

+1 A:

Yeah, CUrl is pretty good in getting page content. I use it with classes like DOMDocument and DOMXPath to grind the content to a usable form.

function __construct($useragent,$url)
    {
     $this->useragent='Firefox (WindowsXP) - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.'.$useragent;
     $this->url=$url;


     $ch = curl_init();
     curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
     curl_setopt($ch, CURLOPT_URL,$url);
     curl_setopt($ch, CURLOPT_FAILONERROR, true);
     curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
     curl_setopt($ch, CURLOPT_AUTOREFERER, true);
     curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
     curl_setopt($ch, CURLOPT_TIMEOUT, 10);
     $html= curl_exec($ch);
     $dom = new DOMDocument();
     @$dom->loadHTML($html);
     $this->xpath = new DOMXPath($dom);
    }
...
public function displayResults($site)
$data=$this->path[0]->length;
    for($i=0;$i<$data;$i++)
    { 
    $delData=$this->path[0]->item($i);

    //setting the href and title properties 
$urlSite=$delData->getElementsByTagName('a')->item(0)->getAttribute('href'); 
    $titleSite=$delData->getElementsByTagName('a')->item(0)->nodeValue;

    //setting the saves and additoinal
      $saves=$delData->getElementsByTagName('span')->item(0)->nodeValue;
    if ($saves==NULL)
    {
     $saves=0;
    }

    //build the array
    $this->newSiteBookmark[$i]['source']='delicious.com';
    $this->newSiteBookmark[$i]['url']=$urlSite;
    $this->newSiteBookmark[$i]['title']=$titleSite;
    $this->newSiteBookmark[$i]['saves']=$saves;


       }

The latter is a part of a class that scrapes data from delicious.com .Not very legal though.

chosta 2009-03-28 16:28:01

It's perfectly legal, the data is already available, just an inefficient way of doing it (HTML isn't the best format for data). Been wishing delicious provided more data (namely search results) in XML recently.

Ross 2009-03-28 18:29:34

well, i wish delicious provided a method to their API that can actually access bookmarks that don't come from your own profile like the ma.gnolia.org "bookmark_find" method. That would have saved some sleeples nights doing my bachelor thesis :=)

chosta 2009-03-28 18:39:59

ansaurus

tags:

views:

answers:

How to use PHP to get a webpage into a variable

related questions