ansaurus

Question

PHP Multiple Occurences Of Words Within A String

Answer 1

+1 A:

function Accept($str)
{
 $words = explode(" ", trim($str));
 $len = count($words);
 for ($i = 0; $i < $len; $i++)
 {
  for ($p = 0; $p < $len; $p++)
  {
   if ($p != $i && $words[$i] == $words[$p])
   {
    return false;
   }
  }
 }
 return true;
}

EDIT

Entire test script. Note, when printing "false" php just prints nothing but true is printed as "1".

<?php

    function Accept($str)
    {
            $words = explode(" ", trim($str));
            $len = count($words);
            for ($i = 0; $i < $len; $i++)
            {
                    for ($p = 0; $p < $len; $p++)
                    {
                            if ($p != $i && $words[$i] == $words[$p])
                            {
                                    return false;
                            }
                    }
            }
            return true;
    }

echo Accept("google makes love"), ", ", Accept("google makes google love"), ", ",
 Accept("google makes love love google"), ", ", Accept("babe health insurance babe");


?>

Prints the correct output:

1, , ,

nlaq 2009-01-08 14:43:18

thanks but I can't get it to work :)

zuk1 2009-01-08 14:49:55

Now it works. I had a typo. $words[$i] == $p should of been $words[$i] == $words[$p]... I just tested it myself :)

nlaq 2009-01-08 14:54:23

code won't work for strings like"babe health insurance babe"

zuk1 2009-01-08 14:58:05

..Erm, it seems to on my end... *posting my entire script*

nlaq 2009-01-08 14:59:40

strange works for all except babe health insurance

zuk1 2009-01-08 15:08:49

Dunno what your apache/PHP install has been smoking but that string works fine as well on my end...

nlaq 2009-01-08 15:12:00

Answer 2

A:

The simplest method is to loop through each word and check against all previous words for duplicates.

lc 2009-01-08 14:44:06

No need to store two arrays. Look at my example :)

nlaq 2009-01-08 14:45:24

It's not quite the simplest method; take a look at my answer.

The Wicked Flea 2009-01-08 14:49:03

Good point. I'm way too tired to be answering questions at this hour. Of course you easily have the subarray of all previous words. Just look up to the current index - 1. Duh. I removed the store two arrays part.

lc 2009-01-08 18:57:47

Answer 3

A:

Regular expression with backreferencing

http://www.regular-expressions.info/php.html

http://www.regular-expressions.info/named.html

cmsjr 2009-01-08 14:44:54

Now there would be two problems. Regex isn't necessary, that's like fixing a car's leaking gas-tank with a blowtorch (when full).

The Wicked Flea 2009-01-08 14:49:41

Answer 4

+3 A:

Try this:

function single_use_of_words($str) {
  $words = explode(' ', $str);
  $words = array_unique($words);
  return implode(' ', $words);
}

The Wicked Flea 2009-01-08 14:48:07

A bit slower then my method. You would have to compare the count of both arrays to get the result that the OP is looking for. However, I have never heard of the array_unique function before so thanks for that :)

nlaq 2009-01-08 14:50:26

I hadn't actually checked the speed, but it produces a result simply. ;)

The Wicked Flea 2009-01-08 14:52:19

Answer 5

+2 A:

<?php
$words = preg_split('\b', $string, PREG_SPLIT_NO_EMPTY);
$wordsUnique = array_unique($words);
if (count($words) != count($wordsUnique)) {
    echo 'Duplicate word found!';
}
?>

Wesley Mason 2009-01-08 14:55:50

The use of the regex to split the words will slow this down yet further.

The Wicked Flea 2009-01-08 15:06:51

Yeah, and I get the feeling reading some of the OP's posts that he intends to use this for searching the entire source of multiple HTML files during the same script. I think it would be important to look at performance in this application.

nlaq 2009-01-08 15:08:01

I used the regex to split on word boundaries as opposed to just splitting on spaces, should have made this clear.

Wesley Mason 2009-01-08 16:44:08

Yes, the preg_split will avoid $words containing null words if two spaces appear in the string.

jmucchiello 2009-01-09 00:57:29

Answer 6

+5 A:

Based on Wicked Flea code:

function single_use_of_words($str) {  
   $words = explode(' ', trim($str));  //Trim to prevent any extra blank
   if (count(array_unique($words)) == count($words)) {
      return true; //Same amount of words
   }   
   return false;
}

Veynom 2009-01-08 14:58:52

Rather than trim/explode, use str_word_count($str, 2). It's a bit more liberal in terms of whitespace/trimming.

Ciaran McNulty 2009-01-08 15:36:46

Whilst this will work, it might be inefficient for large strings; even if the first and second words are identical, array_unique will still visit every element in the array. You could loop through the elements, adding each to something like a binary search tree, terminating on the FIRST duplicate.

Bobby Jack 2009-01-09 00:38:18

It also fails if there are two occurances of 2 or more spaces. There will be multiple nulls it he $words array that will get counted as only one word in the array_unique call.

jmucchiello 2009-01-09 00:54:27

Answer 7

+1 A:

This seems fairly fast. It would be interesting to see (for all the answers) how the memory usage and time taken increase as you increase the length of the input string.

function check($str) {
    //remove double spaces
    $c = 1;
    while ($c) $str = str_replace('  ', ' ', $str, $c);

    //split into array of words
    $words = explode(' ', $str);
    foreach ($words as $key => $word) {
        //remove current word from array
        unset($words[$key]);
        //if it still exists in the array it must be duplicated
        if (in_array($word, $words)) {
            return false;
        }
    }
    return true;
}

Edit

Fixed issue with multiple spaces. I'm not sure whether it is better to remove these at the start (as I have) or check each word is non-empty in the foreach.

Tom Haigh 2009-01-08 15:01:53

I would be interested as well.

nlaq 2009-01-08 15:41:41

Memory usage shouldn't be too bad, but it has O(n^2) time because in_array is a linear search. This also suffers the double space problem.

jmucchiello 2009-01-09 00:55:34

Answer 8

+1 A:

No need for loops or arrays:

<?php

$needle = 'cat';
$haystack = 'cat in the cat hat';

if ( occursMoreThanOnce($haystack, $needle) ) {
    echo 'Success'; 
} 

function occursMoreThanOnce($haystack, $needle) {
    return strpos($haystack, $needle) !== strrpos($haystack, $needle);
}

?>

Kevin 2009-01-08 15:35:48

Answer 9

+1 A:

The regular expression way would definitely be my choice.

I did a little test on a string of 320 words with Veynom's function and a regular expression

function preg( $txt ) {
    return !preg_match( '/\b(\w+)\b.*?\1/', $txt );
}

Here's the test

$time['preg'] = microtime( true );

for( $i = 0; $i < 1000; $i++ ) {
    preg( $txt );
}

$time['preg'] = microtime( true ) - $time['preg'];


$time['veynom-thewickedflea'] = microtime( true );

for( $i = 0; $i < 1000; $i++ ) {
    single_use_of_words( $txt );
}

$time['veynom-thewickedflea'] = microtime( true ) - $time['veynom-thewickedflea'];

print_r( $time );

And here's the result I got

Array
(
    [preg] => 0.197616815567
    [veynom-thewickedflea] => 0.487532138824
)

Which suggests that the RegExp solution, as well as being a lot more concise is more than twice as fast. ( for a string of 320 words anr 1000 iterations )

When I run the test over 10 000 iterations I get

Array
(
    [preg] => 1.51235699654
    [veynom-thewickedflea] => 4.99487900734
)

The non RegExp solution also uses a lot more memory.

So.. Regular Expressions for me cos they've got a full tank of gas

EDIT
The text I tested against has duplicate words, If it doesn't, the results may be different. I'll post another set of results.

Update
With the duplicates stripped out ( now 186 words ) the results for 1000 iterations is:

Array
(
    [preg] => 0.235826015472
    [veynom-thewickedflea] => 0.2528860569
)

About evens

meouw 2009-01-08 23:10:43

The only problem here is the array_unique method will call "hello" and "hello?" two different word while preg_split using \w+ will consider those to both equal "hello". To duplicate the array_unique method you need ([-\b]+) in place of (\w+). That could change your speed.

jmucchiello 2009-01-09 01:04:15

Is that not another argument for the regexp aproach? hello and hello? are the same word surely.

meouw 2009-01-09 09:14:56

ansaurus

tags:

views:

answers:

PHP Multiple Occurences Of Words Within A String

related questions