After seeing several threads rubbishing the regexp method of finding a term to match within an HTML document, I've used the Simple HTML DOM PHP parser (http://simplehtmldom.sourceforge.net/) to get the bits of text I'm after, but I want to know if my code is optimal. It feels like I'm looping too many times. Is there a way to optimise the following loop?
//Get the HTML and look at the text nodes
$html = str_get_html($buffer);
//First we match the <body> tag as we don't want to change the <head> items
foreach($html->find('body') as $body) {
//Then we get the text nodes, rather than any HTML
foreach($body->find('text') as $text) {
//Then we match each term
foreach ($terms as $term) {
//Match to the terms within the text nodes
$text->outertext = str_replace($term, '<span class="highlight">'.$term.'</span>', $text->outertext);
}
}
}
For example, would it make a difference to determine check if I have any matches before I start the loop maybe?