tags:

views:

347

answers:

3

I'm searching through some database search results on a website & trying to highlight the term in the returned results that matches the searched term. Below is what I have so far (in php):

$highlight = trim($highlight);
if(preg_match('|\b(' . $highlight . ')\b|i', $str_content))
{
    $str_content = preg_replace('|\b(' . $highlight. ')(?!["\'])|i', "<span class=\"highlight\">$1</span>", 
    $str_break;
}

The downside of going this route is that if my search term shows up in the url permalink as well, the returned result will insert the span into the href attribute and break the anchor tag. Is there anyway in my regex to exclude "any" information from the search results that appear in between an opening and closing HTML tag?

I know I could use the strip_tags() function and just spit out the results in plain text, but I'd rather not do that if I didn't have to.

A: 

I think assertions is what your looking for.

Ed G
A little more detail would be nice. Actually, make that *a lot* more detail.
Alan Moore
+2  A: 

DO NOT try to parse HTML with regular expressions:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

Try something like PHP Simple HTML DOM.

<?php
// get DOM
$html = file_get_html('http://www.google.com/search?q=hello+kitty');

// ensure this is properly sanitized.
$term = trim($term);

// highlight $term in all <div class="result">...</div> elements
foreach($html->find('div.result') as $e){
   echo str_replace($term, '<span class="highlight">'.$term.'</span>', $e->plaintext);
}
?>

Note: this is not an exact solution because I don't know what your HTML looks like, but this should put you pretty close to being on track.

macek
+1. A regex might do the trick or it might not, but this way is simpler, and much easier to maintain.
Alan Moore
Agreed. Regex just isn't suited for parsing HTML; it was never designed for that.
macek
I Also agree that Regex isn't suited for parsing HTML, but after implementing this solution, i might try using the route of stripping html tags before I regex and then spit out a plain text version of the search results. The time it took for the page to load using this route took considerably longer than regex'ng.
Tim Schoffelman
A: 

I ended up going this route, which so far, works well for this specific situation.

<?php

if(preg_match('|\b(' . $term . ')\b|i', $str_content))
{
    $str_content = strip_tags($str_content);
    $str_content = preg_replace('|\b(' . $term . ')(?!["\'])|i', "<span class=\"highlight\">$1</span>", $str_content);
    $str_content = preg_replace('|\n[^<]+|', '</p><p>', $str_content);
    break;
}

?>

It's still html encoded, but it's easier to parse through now without html tags

Tim Schoffelman