tags:

views:

1793

answers:

5

I want to retrieve the HTML code of a link (web page) in php. Let's say http://stackoverflow.com/questions/ask this is the link and I want the HTML code of teh page which is served. I want to retrieve this HTML code in a php variable.

How can we do this?

+13  A: 

If your PHP server allows url fopen wrappers then the simplest way is:

$html = file_get_contents('http://stackoverflow.com/questions/ask');

If you need more control then you should look at the cURL functions:

$c = curl_init('http://stackoverflow.com/questions/ask');
curl_setopt(CURLOPT_RETURNTRANSFER, true);
curl_setopt(... whatever other options you want...)

$html = curl_exec($c);

if (curl_error($c))
    die(curl_error($c));

// Get the status code
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);

curl_close($c);
Greg
I am worried about 404. In case the link does not exists, then I don't want its content, instead I want to display an error message ?? How we'll find that the url is giving 404 error or not (simply menas URL is working or not)?
Prashant
@Prashant: I've edited to add a curl_getinfo call which will give you 200 or 404 or whatever
Greg
+1  A: 

look at this function:

http://ru.php.net/manual/en/function.file-get-contents.php

Sergei
+1  A: 

Simple way: Use file_get_contents():

$page = file_get_contents('http://stackoverflow.com/questions/ask');

Please note that allow_url_fopen must be true in you php.ini to be able to use URL-aware fopen wrappers.

More advanced way: If you cannot change your PHP configuration, allow_url_fopen is false by default and if ext/curl is installed, use the cURL library to connect to the desired page.

Stefan Gehrig
+2  A: 

You may want to check out the YQL libraries from Yahoo: http://developer.yahoo.com/yql

The task at hand is as simple as

select * from html where url = 'http://stackoverflow.com/questions/ask'

You can try this out in the console at: http://developer.yahoo.com/yql/console (requires login)

Also see Chris Heilmanns screencast for some nice ideas what more you can do: http://developer.yahoo.net/blogs/theater/archives/2009/04/screencast_collating_distributed_information.html

+2  A: 

Also if you want to manipulate the retrieved page somehow, you might want to try some php DOM parser. I find PHP Simple HTML DOM Parser very easy to use.

Dmitri
very interesting. thanks
Peter Perháč