views:

84

answers:

4

Hello all,

I am making use of simplehtmldom which has this funciton:

// get html dom form file
function file_get_html() {
    $dom = new simple_html_dom;
    $args = func_get_args();
    $dom->load(call_user_func_array('file_get_contents', $args), true);
    return $dom;
}

I use it like so:

$html3 = file_get_html(urlencode(trim("$link")));

Sometimes, a URL may just not be valid and I want to handle this. I thought I could use a try and catch but this hasn't worked since it doesn't throw an exception, it just gives a php warning like this:

[06-Aug-2010 19:59:42] PHP Warning:  file_get_contents(http://new.mysite.com/ghs 1/) [<a href='function.file-get-contents'>function.file-get-contents</a>]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found  in /home/example/public_html/other/simple_html_dom.php on line 39

Line 39 is in the above code.

How can i correctly handle this error, can I just use a plain ifcondition, it doesn't look like it returns a boolean.

Thanks all for any help

Update

Is this a good solution?

if(fopen(urlencode(trim("$next_url")), 'r')){

    $html3 = file_get_html(urlencode(trim("$next_url")));

}else{
    //do other stuff, error_logging
    return false;

}
A: 
if(file_exists($file_path)) {
    //do work
}

file_exists() PHP manual page has tip about URL wrappers:

Tip As of PHP 5.0.0, this function can also be used with some URL wrappers. Refer to List of Supported Protocols/Wrappers for a listing of which wrappers support stat() family of functionality.

Treffynnon
Does file_exists work with URLs?
Abs
See tip on file_exists manual Page: TipAs of PHP 5.0.0, this function can also be used with some URL wrappers. Refer to List of Supported Protocols/Wrappers for a listing of which wrappers support stat() family of functionality.
Treffynnon
So I'll have to do an fopen first or the like and then use file_exists...Maybe I could just use fopen.
Abs
Not sure what the down votes are for as this was a valid suggestion to my mind.
Treffynnon
[HTTP Wrappers do not support `stat()`](http://www.php.net/manual/en/wrappers.http.php)
Gordon
+2  A: 

Here's an idea:

function fget_contents() {
    $args = func_get_args();
    // the @ can be removed if you lower error_reporting level
    $contents = @call_user_func_array('file_get_contents', $args);

    if ($contents === false) {
        throw new Exception('Failed to open ' . $file);
    } else {
        return $contents;
    }
}

Basically a wrapper to file_get_contents. It will throw an exception on failure. To avoid having to override file_get_contents itself, you can

// change this
$dom->load(call_user_func_array('file_get_contents', $args), true); 
// to
$dom->load(call_user_func_array('fget_contents', $args), true); 

Now you can:

try {
    $html3 = file_get_html(trim("$link")); 
} catch (Exception $e) {
    // handle error here
}

Error suppression (either by using @ or by lowering the error_reporting level is a valid solution. This can throw exceptions and you can use that to handle your errors. There are many reasons why file_get_contents might generate warnings, and PHP's manual itself recommends lowering error_reporting: See manual

quantumSoup
This isn't good error handling. This is error suppresion.
Abs
You can capture the output of @file_get_contents... If it's === FALSE, then you can throw your own exception, set a return code, or whatever.
grossvogel
Explain downvote
quantumSoup
@quantumSoup: Be aware that your first example `if(file_get...` will give an error if an empty file is read successfully, whereas the second one `if($contents === false)` will only return an error if there really is an error.
grossvogel
@gross Yes, I forgot to check for identical to false on the first one. Got rid of the whole first part though.
quantumSoup
@quantumSoup - I have tried the above after editing the simplehtmldom calss, view it here: http://pastebin.com/5TrEJqQF - I get the error: Fatal error: Cannot redeclare fget_contents
Abs
@Abs Apparently you have a `fget_contents` declared somewhere. Rename the function to `fgc_with_exception`, `file_get_exception`, or whatever (and rename the call in the library accordingly)
quantumSoup
@Abs Here's the [modified class](http://pastebin.com/TeNXUiTW), and here's a [usage example](http://pastebin.com/cG7EX6GM). It works here.
quantumSoup
@quantumSoup - Thanks, I couldn't find another function with the same name, but I have renamed `fget_contents_893`. However, it is always throwing an exception and my script that use to take hours to complete finishes within 10 seconds. Its not returning any html. I am checking what the problem is now, any ideas?
Abs
Strange, it seems file_get_contents doesn't like encoded urls. I have removed that and it seems to be executing fine.
Abs
Well this has been a rough question, at least its working now. Thank you very much quantumSoup for your continued help. :)
Abs
@Abs Indeed, you are not supposed to pass encoded URL's to any of PHP's file functions
quantumSoup
+2  A: 

Use CURL to get the URL and handle the error response that way.

Simple example from curl_init():

<?php
// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);
?>
Treffynnon
So the idea is to check the URL first before passing it to file_get_contents, I think that is a good idea. I think its better to use fopen. See my update, what do you think?
Abs
No, CURL will return the contents for you so there will be no need for a subsequent file_get_contents.
Treffynnon
You'll be interested in CURLOPT_RETURNTRANSFER in curl_setopt(): http://uk3.php.net/manual/en/function.curl-setopt.php
Treffynnon
I can not use the CURL or fopen to get the contents. I need to still make use of the file_get_contents provided by the API, I just need a check to see if file_get_contents hasn't returned an error and do something if it has. Again, I can not directly fiddle with file_get_contents as it is within the API of simplehtmldom like I have mentioned in my question.
Abs
I would just override that method in the library in my own class as its behaviour is clearly unsuitable in this instance.
Treffynnon
+1 Use CURL where available for all HTTP requests... It's designed for it... `file_get_contents` will work (in most cases), but it's not designed for HTTP, so it'll be quite hard to detect certain types of errors, etc...
ircmaxell
A: 

IF youre fetching from an external URL the best handling is going to come fromt he introduction of HTTP library like Zend_Http. This isnt much different than using CURL or fopen except its going to extract the particulars of these "dirvers" into a universal API and then you can choose which you want to use. Its also going to have some built in error trapping to make it easier on you.

If you dont want the overhead of another library then you can code it yourself obviously - in which case i always prefer CURL.

prodigitalson