tags:

views:

594

answers:

2

I have this regex in PHP:

preg_match('/\[summary\](.+)\[\/summary\]/i', $data['text'], $match);

It works fine when the text between the summary tags is on one line. However, when it contains newlines, it doesn't match.

I've tried to find a correct modifier here: http://nl2.php.net/manual/en/reference.pcre.pattern.modifiers.php But the only one related to newlines is "m" and that doesn't do what I want.

How to make this work?

+2  A: 

The man page you've linked to describes another options that has an effect on how line breaks are handled.

s (PCRE_DOTALL) If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

VolkerK
woops, apparently didn't read well enough
Bart van Heukelom
A: 

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

You may find this answer that uses SimpleXML helpful.

Chas. Owens