views:

74

answers:

2

I'm using XML::LibXML to parse a document.

The HTML file behind it, has some minor errors, and the parser reports them:

http://is.gd/create.php?longurl=http://google.com:15: validity error : ID smallink already defined
nal URL was <a href="http://google.com"&gt;http://google.com&lt;/a&gt;&lt;span id="smallink"
                                                                                ^
http://is.gd/create.php?longurl=http://google.com:15: validity error : ID smallink already defined
and use <a href="http://is.gd/fNqtL-"&gt;http://is.gd/fNqtL-&lt;/a&gt;&lt;span id="smallink"
                                                                                ^

However, I disabled error reporting:

my $parser = XML::LibXML->new();
$parser->set_options({ recover           => 2,
                       validation        => 0,
                       suppress_errors   => 1,
                       suppress_warnings => 1,
                       pedantic_parser   => 0,
                       load_ext_dtd      => 0, });

my $doc = $parser->parse_html_file("http://is.gd/create.php?longurl=$url");

My only option to suppress those errors, is to run the script with 2>/dev/null, which I don't want. Could someone help me please get rid of those errors?

+2  A: 

A possible solution is to install a $SIG{__WARN__} handler which filters the messages or just silences all warnings:

local $SIG{__WARN__} = sub { /* $_[0] is the message */ };
eugene y
Thanks, this helps.
polemon
No need for a BEGIN block for this use, as far as I can see. local might be a good idea, though.
ysth
+4  A: 

I have no idea if you're asking XML::LibXML corretly to not print its warnings. I'll assume you are and this is a bug in XML::LibXML (which you should also report to the author), and only address how to suppress warnings.

Every time a warning is about to be printed, perl will look up the value of $SIG{__WARN__} and, if that contains a code reference, invoke it instead of printing the warning itself.

You can use that stop the warnings you want to ignore to be printed to STDERR. However, you should be careful with this. Make sure to only suppress false-positives, not all warnings. Warnings are usually useful. Also, make sure to localize your use of $SIG{__WARN__} to the smallest possible scope to avoid odd side effects.

# warnings happen just as always
my $parser = ...;
$parser->set_options(...);

{ # in this scope we filter some warnings
    local $SIG{__WARN__} = sub {
        my ($warning) = @_;
        print STDERR $warning if $warning !~ /validity error/;
    };

    $parser->parse_html_file(...);
}

# more code, now the warnings are back to normal again

Also note that this is all assuming those warnings come from perl-space. It's quite possible that libxml2, the C library XML::LibXML uses under the hood, writes warnings directly to stderr itself. $SIG{__WARN__} will not be able to prevent it from doing that.

rafl