views:

338

answers:

2

In stock PHP5, what is a good preg_replace expression for making this transformation:

replace newlines with <br />, but only within <pre> blocks

(Feel free to make simplifying assumptions, and ignore corner cases. For example, we can assume that tags will be one line, and not pathological things like )

Input text:

<div><pre class='some class'>1
2
3
</pre>
<pre>line 1
line 2
line 3
</pre>
</div>

Output:

<div><pre>1<br />2<br />3<br /></pre>
<pre>line 1<br />line 2<br />line 3<br /></pre>
</div>

(Motivating context: trying to close out bug 20760 in a wikimedia SyntaxHighlight_GeSHI extension, and finding the my PHP skills (I mostly do python) aren't up to snuff).

I'm open to other solutions, besides regexen, but small is preferred (as an example, building html parse machinery is overkill).

+5  A: 

Something like this?

<?php

$content = "<div><pre class='some class'>1
2
3
</pre>
<pre>line 1
line 2
line 3
</pre>
</div>
";

function getInnerHTML($Node)
{
     $Body = $Node->ownerDocument->documentElement->firstChild->firstChild;
     $Document = new DOMDocument();    
     $Document->appendChild($Document->importNode($Body,true));
     return $Document->saveHTML();
}

$dom = new DOMDocument();
$dom->loadHTML( $content );
$preElements = $dom->getElementsByTagName('pre');

if ( count( $preElements ) ) {
    foreach ( $preElements as $pre ) {
    $value = preg_replace( '/\n|\r\n/', '<br/>', $pre->nodeValue  );
    $pre->nodeValue = $value;
    }

    echo html_entity_decode( getInnerHTML( $dom->documentElement ) );
}
meder
updated answer with `html_entity_decode` , remove it if you don't need it.
meder
I just threw up a quick regex for newlines, if you see any issue let me know, for you perl regex wizards :)
meder
This fails for my purposes, in that the html_entity_decode adds in newlines between elements. Don't blame me, blame wikimedia's Parser class :)
Gregg Lind
Correction: saveHtml adds newlines . I do like the approach though, generally, just doesn't work for my application.
Gregg Lind
A: 

Based on something SilentGhost said (which isn't showing up here for some reason):

<?php
$str = "<div><pre class='some class' >1
2
3
< / pre>
<pre>line 1
line 2
line 3
</pre>
</div>";

$out = "<div><pre class='some class' >1<br />2<br />3<br />< / pre>
<pre>line 1<br />line 2<br />line 3<br /></pre>
</div>";

function protect_newlines($str) {
    // \n -> <br />, but only if it's in a pre block
    // protects newlines from Parser::doBlockLevels()
    /* split on <pre ... /pre>, basically.  probably good enough */
    $str = " ".$str;  // guarantee split will be in even positions
    //$parts = preg_split('/(<pre .*  pre>)/Umsxu',$str,-1,PREG_SPLIT_DELIM_CAPTURE);
    $parts = preg_split("/(< \s* pre .* \/ \s* pre \s* >)/Umsxu",$str,-1,PREG_SPLIT_DELIM_CAPTURE);
    foreach ($parts as $idx=>$part) {
        if ($idx % 2) {
            $parts[$idx] = preg_replace("/\n/", "<br />", $part);
        }
    }
    $str = implode('',$parts);
    /* chop off the first space, that we had added */
    return substr($str,1);
}

assert(protect_newlines($str) === $out);
?>
Gregg Lind