views:

58

answers:

2

I found the code below on stackoverflow and it works well in finding the most common words in a string. But can I exclude the counting on common words like "a, if, you, have, etc"? Or would I have to remove the elements after counting? How would I do this? Thanks in advance.

<?php

$text = "A very nice to tot to text. Something nice to think about if you're into text.";


$words = str_word_count($text, 1); 

$frequency = array_count_values($words);

arsort($frequency);

echo '<pre>';
print_r($frequency);
echo '</pre>';
?>
+2  A: 

There's not additional parameters or a native PHP function that you can pass words to exclude. As such, I would just use what you have and ignore a custom set of words returned by str_word_count.

Jason McCreary
+1  A: 

You can do this easily by using array_diff():

$words = array("if", "you", "do", "this", 'I', 'do', 'that');
$stopwords = array("a", "you", "if");

print_r(array_diff($words, $stopwords));

gives

 Array
(
    [2] => do
    [3] => this
    [4] => I
    [5] => do
    [6] => that
)

But you have to take care of lower and upper case yourself. The easiest way here would be to convert the text to lowercase beforehand.

Felix Kling
Thanks, your first version worked by excluding unwanted words, but I don't understand this version. All the array contains is the words I don't want instead of filtering them out.
usertest
@user201140: You have to be careful with the order of the arguments. `array_diff` removes all the elements from the **first** array, that are in the **second** array. So the first one has to be your text divided into words and the second one is an array of words you don't want to have.
Felix Kling