tags:

views:

268

answers:

3

I have a script that returns the following in a variable called $content

<body>
<p><span class=\"c-sc\">dgdfgdf</span></p>
</body>

I however need to place everything between the body tag inside an array called matches

I do the following to match the stuff between the body tag

preg_match('/<body>(.*)<\/body>/',$content,$matches);

but the $mathces array is empty, how could I get it to return everything inside the body tag

+2  A: 

You should not use regular expressions to parse HTML.

Your particular problem in this case is you need to add the DOTALL modifier so that the dot matches newlines.

preg_match('/<body>(.*)<\/body>/s', $content, $matches);

But seriously, use an HTML parser instead. There are so many ways that the above regular expression can break.

Mark Byers
+3  A: 

Don't try to process html with regular expressions! Use PHP's builtin parser instead:

$dom = new DOMDocument;
$dom->loadHTML($string);
$bodies = $dom->getElementsByTagName('body');
assert($bodies->length === 1);
$body = $bodies->item(0);
for ($i = 0; $i < $body->children->length; $i++) {
    $body->remove($body->children->item($i));
}
$string = $dom->saveHTML();
soulmerge
A: 

If for some reason you don't have DOMDocument installed, try this

Step 1. Download simple_html_dom

Step 2. Read the documentation about how to use its selectors

require_once("simple_html_dom.php");
$doc = new simple_html_dom();
$doc->load($someHtmlString);
$body = $doc->find("body")->innertext;
Justin Johnson