tags:

views:

46

answers:

1

From this question: http://stackoverflow.com/questions/1524377/what-regex-pattern-do-i-need-for-this I've been using the following code:

function process($node, $replaceRules) {
    if($node->hasChildNodes()) {
       foreach ($node->childNodes as $childNode) {
         if ($childNode instanceof DOMText) {
           $text = preg_replace(
            array_keys($replaceRules),
            array_values($replaceRules),
            $childNode->wholeText
           );
           $node->replaceChild(new DOMText($text),$childNode);
          } else {
            process($childNode, $replaceRules);
          }
       }
    }
}

$replaceRules = array(
  '/\b(c|C)olor\b/' => '$1olour',
  '/\b(kilom|Kilom|M|m)eter/' => '$1etre',
);

$htmlString = "<p><span style='color:red'>The color of the sky is: gray</p>";
$doc = new DOMDocument();
$doc->loadHtml($htmlString);
process($doc, $replaceRules);
$string = $doc->saveHTML();
echo mb_substr($string,119,-15);

It works fine, but it fails (as the child node is replaced on the first instance) if the html has text and HTML. So it works on

<div>The distance is four kilometers</div>

but not

<div>The distance is four kilometers<br>1000 meters to a kilometer</div>

or

<div>The distance is four kilometers<div class="guide">1000 meters to a kilometer</div></div>

Any ideas of a method that would work on such examples?

A: 

Calling $node->replaceChild will confuse the $node->childNodes iterator. You can get the child nodes first, and then process them:

function process($node, $replaceRules) {
    if($node->hasChildNodes()) {
        $nodes = array();
        foreach ($node->childNodes as $childNode) {
            $nodes[] = $childNode;
        }
        foreach ($nodes as $childNode) {
            if ($childNode instanceof DOMText) {
                $text = preg_replace(
                    array_keys($replaceRules),
                    array_values($replaceRules),
                    $childNode->wholeText);
                $node->replaceChild(new DOMText($text),$childNode);
            }
            else {
                process($childNode, $replaceRules);
            }
        }
    }
}
Lukáš Lalinský
Brilliant. Thanks a lot.
Apemantus