views:

995

answers:

2

I need to parse some HTML files, however, they are not well-formed and PHP prints out warnings to. I want to avoid such debugging/warning behavior programatically. Please advise. Thank you!

Code:

// create a DOM document and load the HTML data
$xmlDoc = new DomDocument;
// this dumps out the warnings
$xmlDoc->loadHTML($fetchResult);

This:

@$xmlDoc->loadHTML($fetchResult)

can suppress the warnings but how can I capture those warnings programatically?

+2  A: 

You can install a temporary error handler with set_error_handler

class ErrorTrap {
  protected $callback;
  protected $errors = array();
  function __construct($callback) {
    $this->callback = $callback;
  }
  function call() {
    $result = null;
    set_error_handler(array($this, 'onError'));
    try {
      $result = call_user_func_array($this->callback, func_get_args());
    } catch (Exception $ex) {
      restore_error_handler();        
      throw $ex;
    }
    restore_error_handler();
    return $result;
  }
  function onError($errno, $errstr, $errfile, $errline) {
    $this->errors[] = array($errno, $errstr, $errfile, $errline);
  }
  function ok() {
    return count($this->errors) === 0;
  }
  function errors() {
    return $this->errors;
  }
}

Usage:

// create a DOM document and load the HTML data
$xmlDoc = new DomDocument();
$caller = new ErrorTrap(array($xmlDoc, 'loadHTML'));
// this doesn't dump out any warnings
$caller->call($fetchResult);
if (!$caller->ok()) {
  var_dump($caller->errors());
}
troelskn
Thanks! Such a neat trick! Code is simple and clean.
Viet
Seems like a lot of overkill for the situation. Note PHP's libxml2 functions.
thomasrutter
Good point, Thomas. I didn't know about these functions when I wrote this answer. If I'm not mistaken, it does the same thing internally btw.
troelskn
It has the same effect in this case yes, though it's done at a different level: with the above solution, PHP errors are generated but suppressed but with mine, they don't become PHP errors. I personally feel that if doing something involves suppressing PHP errors either through @ or set_error_handler(), then it's the wrong way to do it. That's just my opinion though. Note that PHP errors and exceptions are a different thing entirely - using try {} catch() {} is fine.
thomasrutter
I think I've seen some bug reports, that suggests that `libxml_use_internal_errors` hooks in to php's error handler.
troelskn
libxml_use_internal_errors() controls whether libxml errors hook into php's error handler or not. The default is to do so; setting libxml_use_internal_errors(true) (should) prevent this. Have you seen a bug which contradicts this? Feel free to link it.
thomasrutter
+6  A: 

Call

libxml_use_internal_errors(true);

prior to processing with with $xmlDoc->loadHTML()

This tells libxml2 not to send errors and warnings through to PHP. Then, to check for errors and handle them yourself, you can consult libxml_get_last_error() and/or libxml_get_errors() when you're ready.

thomasrutter
+1 That's neat. Thanks!
Viet
Fantastic, simpler than accepted solution I believe. Worked instantly for me.
James