tags:

views:

83

answers:

2

Hi there,

Does anyone know of an available PHP function that takes a piece of text, with say a few hundreds of words long and produces an array of keywords? Ie. the most important, frequently occuring unique terms?

Thanks Philip

+5  A: 

No such function exists (would be magical if it did) but to start something off, you could do the following:

  1. Split the text at the space, producing an array of words.
  2. Remove stop-words and unnecessary punctuation and symbols (possibly using regular expressions - See preg_replace).
  3. Count the number of occurences of each word in the remaining array, and sort it in order of frequency (so the most frequently occuring word is at the first offset, i.e. $words[0]).
  4. Use array_unique to remove the duplicates, thus producing an array of unique keywords ordered by frequency of occurrence.
karim79
You beat me to it.
rpflo
A: 

Something like this might do the trick:

$thestring = 'the most important, frequently occuring unique terms?';
$arrayofwords = explode(" ", $thestring);
echo print_r($arrayofwords);

Also you may replacement the comma "," for a blank, so you get clean keywords.

$thestring = 'the most important, frequently occuring unique terms?';
$cleaned_string = str_replace(",", "", "$thestring");
$arrayofwords = explode(" ", $cleaned_string);
echo print_r($arrayofwords);
Codex73