ansaurus

Question

How to regex match text with different endings?

Answer 1

+1 A:

the .* match should be non greedy (match the minimum of arbitrary characters instead of the maxium), that is (.*?) i guess in PHP.

amic 2010-04-02 10:11:17

Answer 2

+1 A:

Try making your match non-greedy by using (.*?) in place of (.*)

codaddict 2010-04-02 10:12:07

Answer 3

+6 A:

Don't use regex to parse HTML. PHP provides DOMDocument that can be used for this purpose.

Having said that you have some errors in your regular expression:

You need parentheses around the alternation.
You need lazy modifiers.
You can't type 'header' to match 'Information'.

With these changes it would look like this:

<h2>.*?</h2>\n\t+<p>.*?(<br />|</p>)

Your regular expression is also very fragile. For example, if the input contains spaces instead of tabs or the line ending is Windows-style, your regular expression will fail. Using a proper HTML parser will give a much more robust solution.

Mark Byers 2010-04-02 10:12:10

+1 for don't use regex. See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags for further details.

Török Gábor 2010-04-02 10:14:11

Cheers, and I will look into that DOMDocument as I can't say I like using regex, it was just all I knew that cold do it.

Mint 2010-04-02 10:30:09

Oops, that header thing was a typo.

Mint 2010-04-02 10:32:21

It's ridiculous how many times this needs to be restated. This same question pops up almost daily.

Keith Rousseau 2010-04-02 11:56:31

Answer 4

+2 A:

Use \s to match any whitespace character (including spaces, tabs, new-line feeds, etc.), e.g.

preg_match('#<h2>header</h2>\s*<p>(.*)<br />|</p>#', $result, $postMessage);

But, as already mentioned, do not use regular expressions to parse HTML.

Felix Kling 2010-04-02 10:12:34

Ah yeah, might use \s then. Will definitely read up on this DOMDocument thingy that PHP provide.

Mint 2010-04-02 10:29:05

ansaurus

tags:

views:

answers:

How to regex match text with different endings?

related questions