views:

631

answers:

3

I need a regex in php for matching contents between tags of an element, e.g. <body> and </body> with the perl compatible preg_match.

So far I tried with:

// $content is a string with html content

preg_match("/<body(.|\r\n)*\/body>/", $content, $matches);

print_r($matches);

…but the printout is an empty array.

+3  A: 

You simply have to add the s modifier to have the dot match all characters, including new lines :

preg_match("/<body.*\/body>/s", $content, $matches);

as explained in the doc : http://nl2.php.net/manual/en/reference.pcre.pattern.modifiers.php

Wookai
Thanks, it worked!
Spoike
A: 

perl regexp match by default one line

you have to specify that you want to do a multi line search by adding a s or a m after the last /

ex:

$> perl -e 'print $1 if "bla\nbla\n<body>\nfirst line\n second line\n</body>\nbla" =~ /^.*<body>(.*)<\/body>.*$/s'

see: http://www.perl.com/pub/a/2003/06/06/regexps.html

chub
Setting the -m flag is not sufficient, as it only changes the behavior of the ^ and $ operators.
Wookai
A: 

Almost a duplicated question :

Have a look here :
http://stackoverflow.com/questions/356340/regular-expression-to-extract-html-body-content

It can help you even if it isn't exactly the same problem.

Matthieu
Almost :)... IMO, this question is more about the multiline part of a regex, and less about extraction itself.
Wookai