ansaurus

Question

Help with a regex that strips out leading white space.

Answer 1

+1 A:

Here's how I would do it:

$str = preg_replace(
    '~^[ \t]++(?=(?:[^<]++|<(?!/?+pre\b))*+(?:\z|<pre\b))~im',
    '', $str);

After matching some line-leading whitespace, the lookahead scans ahead for <pre> or </pre> tags. The meat of the lookahead is this bit:

(?:[^<]++|<(?!/?+pre\b))*+

It matches zero or more of anything that's not a left angle bracket, or a left angle bracket if it's not the beginning of a <pre> or </pre> tag. That part will only stop matching when it encounters a <pre> (starting) tag, a </pre> (ending) tag, or the end of the input. If it's an ending tag that stops it, you know you're inside a <PRE> element, so you don't want to do the replacement.

The possessive quantifiers ('++', '*+', and '?+') are essential to prevent catastrophic backtracking. (I can't help it: that phrase always makes me think of the resonance cascade scenario from Half-Life.)

This technique also assumes reasonably well-formed HTML, i.e., all <pre>...</pre> tags properly balanced. Tags inside of SGML comments will mess it up, too--unless they happen to be balanced. You can deal with comments, too, if you don't mind making the regex twice as long and three times as ugly. :)

Alan Moore 2009-08-09 03:44:51

This is a great answer (I will try it in a moment) +1 for everything (including Half Life reference) :)

alex 2009-08-09 05:12:03

Answer 2

A:

Your problem is discussed alot I guess - check out this link

http://us3.php.net/manual/en/function.nl2br.php#91828

This one as well:

http://us3.php.net/manual/en/function.nl2br.php#39641

sylvanaar 2009-08-09 03:48:05

ansaurus

tags:

views:

answers:

Help with a regex that strips out leading white space.

related questions