I would say you could :
- split the string into an array of words
- with
explode
- or
preg_split
- depending on the complexity you'll accept for your words separators
- use
array_filte
r to only keep the lines (i.e. words) you want
- the callback function will have to return false for all non-valid-words
- and, then, use
array_count_values
on the resulting list of words
- which will count how many times each words is present in the array of words
EDIT : and, just for fun, here's a quick example :
First of all, the string, that gets exploded into words :
$str = "will see you in London tomorrow and Kent the day after tomorrow";
$words = preg_split('/\s+/', $str, -1, PREG_SPLIT_NO_EMPTY);
var_dump($words);
Which gets you :
array
0 => string 'will' (length=4)
1 => string 'see' (length=3)
2 => string 'you' (length=3)
3 => string 'in' (length=2)
4 => string 'London' (length=6)
5 => string 'tomorrow' (length=8)
6 => string 'and' (length=3)
7 => string 'Kent' (length=4)
8 => string 'the' (length=3)
9 => string 'day' (length=3)
10 => string 'after' (length=5)
11 => string 'tomorrow' (length=8)
Then, the filteting :
function filter_words($word) {
// a pretty simple filter ^^
if (strlen($word) >= 5) {
return true;
} else {
return false;
}
}
$words_filtered = array_filter($words, 'filter_words');
var_dump($words_filtered);
Which outputs :
array
4 => string 'London' (length=6)
5 => string 'tomorrow' (length=8)
10 => string 'after' (length=5)
11 => string 'tomorrow' (length=8)
And, finally, the counting :
$counts = array_count_values($words_filtered);
var_dump($counts);
And the final result :
array
'London' => int 1
'tomorrow' => int 2
'after' => int 1
Now, up to you to build up from here ;-)
Mainly, you'll have to work on :
- A better exploding function, that deals with ponctuation (or deal with that during filtering)
- An "intelligent" filtering function, that suits your needs better than mine
Have fun !