How can I go about grabbing [x] number of words before and after a given keyword in a string in PHP? I am trying to tokenize results from a mysql query tailored to the keyword as a snippet.
+1
A:
$string = 'This is a test string to see how to grab words from an arbitrary sentence. It\'s a little hacky (as you can see from the results) - but generally speaking, it works.';
echo $string,'<br />';
function getWords($string,$word,$before=0,$after=0) {
$stringWords = str_word_count($string,1);
$myWordPos = array_search($word,$stringWords);
if (($myWordPos-$before) < 0)
$before = $myWordPos;
return array_slice($stringWords,$myWordPos-$before,$before+$after+1);
}
var_dump(getWords($string,'test',2,1));
echo '<br />';
var_dump(getWords($string,'this',2,1));
echo '<br />';
var_dump(getWords($string,'sentence',1,3));
echo '<br />';
var_dump(getWords($string,'little',2,2));
echo '<br />';
var_dump(getWords($string,'you',2,2));
echo '<br />';
var_dump(getWords($string,'results',2,2));
echo '<br />';
var_dump(getWords($string,'works',2,2));
echo '<hr />';
function getWords2($string,$word,$before=0,$after=0) {
$stringWords = str_word_count($string,1);
$myWordPos = array_search($word,$stringWords);
$stringWordsPos = array_keys(str_word_count($string,2));
if (($myWordPos+$after) >= count($stringWords))
$after = count($stringWords) - $myWordPos - 1;
$startPos = $stringWordsPos[$myWordPos-$before];
$endPos = $stringWordsPos[$myWordPos+$after] + strlen($stringWords[$myWordPos+$after]);
return substr($string,$startPos,$endPos-$startPos);
}
echo '[',getWords2($string,'test',2,1),']<br />';
echo '[',getWords2($string,'this',2,1),']<br />';
echo '[',getWords2($string,'sentence',1,3),']<br />';
echo '[',getWords2($string,'little',2,2),']<br />';
echo '[',getWords2($string,'you',2,2),']<br />';
echo '[',getWords2($string,'results',2,2),']<br />';
echo '[',getWords2($string,'works',1,3),']<br />';
But what do you want to happen if the word appears multiple times? Or if the word doesn't appear in the string?
EDIT
Extended version of getWords2 to return up to a set number of occurrences of the keyword
$string = 'PHP is a widely-used general-purpose scripting language that is especially suited for Web development. The current version of PHP is 5.3.3, released on July 22, 2010. The online manual for PHP is an excellent resource for the language syntax and has an extensive list of the built-in and extension functions. Most extensions can be found in PECL. PEAR contains a plethora of community supplied classes. PHP is often paired with the MySQL relational database.';
echo $string,'<br />';
function getWords3($string,$word,$before=0,$after=0,$maxFoundCount=1) {
$stringWords = str_word_count($string,1);
$stringWordsPos = array_keys(str_word_count($string,2));
$foundCount = 0;
$foundInstances = array();
while ($foundCount < $maxFoundCount) {
if (($myWordPos = array_search($word,$stringWords)) === false)
break;
++$foundCount;
if (($myWordPos+$after) >= count($stringWords))
$after = count($stringWords) - $myWordPos - 1;
$startPos = $stringWordsPos[$myWordPos-$before];
$endPos = $stringWordsPos[$myWordPos+$after] + strlen($stringWords[$myWordPos+$after]);
$stringWords = array_slice($stringWords,$myWordPos+1);
$stringWordsPos = array_slice($stringWordsPos,$myWordPos+1);
$foundInstances[] = substr($string,$startPos,$endPos-$startPos);
}
return $foundInstances;
}
var_dump(getWords3($string,'PHP',2,2,3));
Mark Baker
2010-09-10 13:09:13
Hi Mark thanks, if the word appears multiple times, then I want to piece together a snippet from multiples, say up to 3 iterations or so. so the end result might look like: this is about KEYWORD which is about... this is another KEYWORD piece of puzzle... this would be KEYWORD the third piece
Jaime Cross
2010-09-10 14:44:15
What I am trying to do is to give enough details about what each result is, without the being too similar to each other.
Jaime Cross
2010-09-10 14:46:14
That works like a charm, thanks much Mark :)
Jaime Cross
2010-09-12 14:28:49
If it works as you want, then just accept the answer
Mark Baker
2010-09-12 17:56:31
Done, I actually had to look for the check to accept, sorry for the wait mark.
Jaime Cross
2010-09-13 22:58:54