tags:

views:

77

answers:

3

How can I get the html source code of http://www.example-webpage.com/file.html without using file_get_contents(): I need to know this because on some webhosts allow_url_fopen is disabled so you can't use file_get_contents(). Is it possible to get the html file's source with cURL (if cURL support is enabled)? If so, how? Thanks.

+1  A: 
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($curl);
curl_close($curl);

Source: http://www.christianschenk.org/blog/php-curl-allow-url-fopen/

Brad
A: 

Try the following:

$ch = curl_init("http://www.example-webpage.com/file.html");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$content = curl_exec($ch);
curl_close($ch);

I would only recommend this for small files. Big files are read as a whole and are likely to produce a memory error.

edit: after some discussion in the comments we found out that the problem was that the servercouldnt resolve the host name and the page was in addition a https resource so here comes your temporary solution (until your serveradmin fixes the name resolving).

what i did is just pinging graph.facebook.com to see the ip adress, replace the hostname by the ip adress and instead give the header manually. this however renders the ssl certificate invalid so we have to supress peer verification

//$url = "https://graph.facebook.com/19165649929?fields=name";
$url = "https://66.220.146.224/19165649929?fields=name";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Host: graph.facebook.com'));
$output = curl_exec($ch);
curl_close($ch); 

keep in mind that the ip adress might change and this is an eror source. you should as well do some error handling using curl_error();

Joe Hopfgartner
thanks but I get a blank file. I'm trying to find the error
John Paneth
John, if this isn't working then check your URL. Also, don't forget the curl_close($ch) at the end.
Brad
Does it work with a plain text file instead of a html file? I tested it with a plain text file - and I get a blank page.
John Paneth
youre right closing up curl is not a bad idea, ill investigate the use case with the text file. maby you have an url for me (because theres practically no difference but there may be another error...)?
Joe Hopfgartner
okay downlaoding http://www.facebook.com/robots.txt worked fine, can you give me the url that doesnt work?
Joe Hopfgartner
try this please: https://graph.facebook.com/19165649929?fields=namethat does not work for me. Obviously it's also accessable via "http"
John Paneth
its https, not http. it works here with the example above but the ssl settins may be version specific! please try this: var_dump(curl_error($ch)); before curl_close and tell me what it outputs!
Joe Hopfgartner
string(42) "Couldn't resolve host 'graph.facebook.com'"
John Paneth
so your server cant resolve the ip adress. you should contact your server administrator he should set up correct dns resolving. theres nothing wrong with your code. the only solution i now know or you without correcting this server issue is to directly get the data from the ip adress and send the host header, but you will have to deal with ssl warnings. by the way it would be nice if you upvoted this :)
Joe Hopfgartner
ok thank you very much
John Paneth
dont give up just yet, i modified the answer and posted a temporary solution for you!
Joe Hopfgartner
A: 

Try http://php.net/manual/en/curl.examples-basic.php :)

<?php

$ch = curl_init("http://www.example.com/");
$fp = fopen("example_homepage.txt", "w");

curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);

$output = curl_exec($ch);
curl_close($ch);
fclose($fp);
?>

As the documentation says:

The basic idea behind the cURL functions is that you initialize a cURL session using the curl_init(), then you can set all your options for the transfer via the curl_setopt(), then you can execute the session with the curl_exec() and then you finish off your session using the curl_close().

phidah