tags:

views:

80

answers:

5

Hi,

I want PHP regex that can find errors on a page. So when I visit a site and crawl the page that I can list the errors on the site.

Currently I have the following code:

preg_match('/<b>.+<\/b>:.+ in <b>\/.+<\/b> on line <b>[0-9]+<\/b><br( \/)?>/msi',$html,$errors);

It can show if errors occurred, but it will not list them! I get the full html page in the array ($errors[0])

Could anybody help?

EDIT: So I have a page with for example the following HTML-source, from which I want to extract the PHP errors:

<b>Warning</b>:  session_start() [<a href='function.session-start'>function.session-start</a>]: The session id contains invalid characters, valid characters are only a-z, A-Z and 0-9 in <b>/home/.../public_html/articlescript/init.php</b> on line <b>127</b><br />
<br />
<b>Warning</b>:  session_start() [<a href='function.session-start'>function.session-start</a>]: Cannot send session cache limiter - headers already sent (output started at /home/.../public_html/articlescript/init.php:127) in <b>/home/.../public_html/articlescript/init.php</b> on line <b>127</b><br />
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>
    <title>...
A: 

Put brackets () around the bits of regex that you want to be stored in $errors.
You'll also want to use preg_match_all() rather then preg_match().

chigley
That doesn't work for me
Kevin
A: 

Remember to escape your \ in strings.

preg_match_all('#<b>(.+?)</b>:(.+?) in <b>(.+?)</b> on line <b>([0-9]+)</b><br(?: /)?>#is',$string,$errors);

This code on ideone

Colin Hebert
http://ideone.com/utL3K WORKS!
Kevin
@Kevin, even so, you should read @Gumbo answer.
Colin Hebert
A: 

If this is your own website you can either: set the log levels and parse your log files (easier) or run your scripts from the command line with php -l.

ikanobori
The problem is, it is the site of a client, so I can't use that method.
Kevin
+2  A: 

Forgive my language but it's quite foolish to attempt to parse HTML with regular expressions, especially potentially-malformed HTML. Use an HTML parsing library instead.

For HTML parsing and validation in HTML, I would refer to this answer; also check out the tidy extension.

Ether
Well, in this case the HTML isn't really XML compliant, and moreover you can't really know where this error will show up so an XML parser (or HTML for what it worth) won't help.
Colin Hebert
@Colin: there are HTML parsers that will identify errors, which is precisely what the OP wants to do. HTML is not regular, so using a regular expression will not be fruitful.
Ether
That comment must be one of the most-linked ones here.
CanSpice
You're prob. right, but how do I do this in this situation?
Kevin
@Kevin: I've edited my answer with the best links I could find.
Ether
+5  A: 

Since – well, you know – you shouldn’t use regular expressions to parse HTML, try this using PHP’s DOM library:

libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($str);
$messages = array();
foreach ($doc->getElementsByTagName('b') as $elem) {
    if (in_array($elem->textContent, array('Error', 'Warning', 'Notice'))) {
        $buffer = $elem->textContent;
        while ($elem->nextSibling !== null && strtolower($elem->nextSibling->localName) !== 'br') {
            $elem = $elem->nextSibling;
            $buffer .= $elem->textContent;
        }
        $messages[] = $buffer;
    }
}

This will search for B elements that’s content is one of “Error”, “Warning”, or “Notice” and take the textual contents from there up to the next BR element. The initial call of libxml_use_internal_errors will prevent that parsing errors will be reported.

Gumbo
This works not entirely, how can I let this work the same as http://ideone.com/utL3K?
Kevin
@Kevin: Ok, I have to admit that this might fail if the document is actually invalid HTML and is fragmented in such a way that parsing fails.
Gumbo
No, it just does not list the errors correct. The while doesn't work. If I just delete the while it will list the errors... But not the texts
Kevin
@Kevin: I fixed that bug.
Gumbo
Thanks for the help!
Kevin