tags:

views:

76

answers:

2

Hi, I'm writing a search engine for my site and need to extract chunks of text with given keyword and few words around for the search result list. I ended with something like that:


/**
 * This function return part of the original text with
 * the searched term and few words around the searched term
 * @param string $text Original text
 * @param string $word Searched term
 * @param int $maxChunks Number of chunks returned
 * @param int $wordsAround Number of words before and after searched term
 */
public static function searchTerm($text, $word=null, $maxChunks=3, $wordsAround=3) {
        $word = trim($word);
        if(empty($word)) {
            return NULL;
        }
        $words = explode(' ', $word); // extract single words from searched phrase
        $text  = strip_tags($text);  // clean up the text
        $whack = array(); // chunk buffer
        $cycle = 0; // successful matches counter
        foreach($words as $word) {
            $match = array();
            // there are named parameters 'pre', 'term' and 'pos'
            if(preg_match("/(?P\w+){0,$wordsAround} (?P$word) (?P\w+){0,$wordsAround}/", $text, $match)) {
                $cycle++;
                $whack[] = $match['pre'] . ' ' . $word . ' ' . $match['pos'];
                if($cycle == $maxChunks) break;
            }
        }
        return implode(' | ', $whack);
    }
This function does not work, but you can see the basic idea. Any suggestions how to improve the regular expression is welcome!

+1  A: 

Never, never inject user content into the pattern of a RegEx without using preg_quote to sanitize the input:

http://us3.php.net/manual/en/function.preg-quote.php

Oxyrubber
OK, that's one suggestion, but if the regular does not work, this is not critical. Thanks anyway, I'll put the preg_quote in.
sir.otasek
Are you trying to _optimize_ the RegEx or _fix_ it?
Oxyrubber
@Oxyrubber - I'm no friend of regular expressions, so this was my first idea but I wasn't able to move on and make i work the right way
sir.otasek
+1  A: 

why re-invent the wheel here doesn't google have the best search engine I would look at their appliance

mcgrailm
I know they have it and I like the way they have it. But I was hoping to solve the problem with one lightweight function, not the whole third party's search engine..
sir.otasek