tags:

views:

575

answers:

4

I have a decent, lightweight search engine working for one of my sites using MySQL fulltext indexes and php to parse the results. Work fine but I'd like to offer more 'google-like' results with text snippets from the results and the found words highlighted. Looking for a php based solution. Any recommendations?

+1  A: 

use preg_replace() (or similar function) and replace your search string with highlighted text. e.g.

$highlighted_text = preg_replace("/$search/", "<span class='highlighted'>$search</span>", $full_text);
Brian Ramsay
Highlighting isn't my problem. I am wondering about the best way to go about getting the snippet _around_ the search term.
phirschybar
+1  A: 

For MySQL, your best bet would be to first split up your query words, clean up your values, and then concatenate everything back into a nice regular expression.

In order to highlight your results, you can use the <strong> tag. Its usage would be semantic as you are putting strong emphasis on an item.

// Done ONCE per page load:
  $search = "Hello World"

  //Remove the quotes and stop words
  $search = str_ireplace(array('"', 'and', 'or'), array('', '', ''), $search);

  // Get the words array
  $words = explode(' ', $search);

  // Clean the array, remove duplicates, etc.
  function remove_empty_values($value) { return trim($value) != ''; }
  function regex_escape(&$value) { $value = preg_quote($value, '/'); }
  $words = array_filter($words, 'remove_empty_values');
  $words = array_unique($words);
  array_walk($words, 'regex_escape');

  $regex = '/(' . implode('|', $words) . ')/gi';

// Done FOR EACH result
  $result = "Something something hello there yes world fun nice";
  $highlighted = preg_replace($regex, '<strong>$0</strong>', $result);


If you are using PostgreSQL, you can simply use the built-in ts_headline as described in the documentation.

Andrew Moore
+2  A: 

Searching the actual database is fine until you want to add snazzy features like the one above. In my experience it is best to create a dedicated search table, with keywords and page IDs/URLs/etc. Then populate this table every n hours with content. During this population you can add snippets for each document for each keyword.

Alternatively a quick hack might be:

<?php
$text = 'This is an example text page with content. It could be red, green or blue.';
$keyword = 'red';
$size = 5; // size of snippet either side of keyword

$snippet = '...'.substr($text, strpos($text, $keyword) - $size, strpos($text, $keyword) + sizeof($keyword) + $size).'...';
$snippet = str_replace($keyword, '<strong>'.$keyword.'</strong>', $snippet);
echo $snippet;
?>
Al
I like this solution. I have fairly light traffic so the extra processing on each search request is fine. Thanks!!
phirschybar
A: 

On a larger site I would think that using javascript, something like jquery would be the way to go

jasondavis