views:

277

answers:

3

I just wonder how array_diff() works.And obviously it couldn't work as follows

function array_diff($arraya, $arrayb)
{
    $diffs = array();
    foreach ($arraya as $keya => $valuea)
    {
        $equaltag = 0;
        foreach ($arrayb as $valueb)     
        {
            if ($valuea == $valueb)
            {
                $equaltag =1;
                break;
            }
        }
        if ($equaltag == o)
        {
              $diffs[$keya]=$valuea;
        }

    }
    return $diffs;                          
}                                  //couldn't be worse than this

Hope someone can show me some better solution.

Thanks.

EDIT @animuson:

function array_diff($arraya, $arrayb)
{
    foreach ($arraya as $keya => $valuea)
    {
        if (in_array($valuea, $arrayb))
        {
            unset($arraya[$keya]);
        }
    }
    return $arraya;
}
+4  A: 

The best solution to know how it works it to take a look at its source-code ;-)
(Well, that's one of the powers of open source -- and if you see some possible optimization, you can submit a patch ;-) )

For array_diff, it should be in ext/standard -- which means, for PHP 5.3, it should be there : branches/PHP_5_3/ext/standard

And, then, the array.c file looks like a plausible target ; the php_array_diff function, line 3381, seems to correspond to array_diff.


(Good luck going through the code : it's quite long...)

Pascal MARTIN
I'm too tired now:(
SpawnCxy
Yeah, that's the kind of situations in which I think I should not have stopped using C... But, in the same have, have no regret ^^
Pascal MARTIN
A: 

From PHP: "Returns an array containing all the entries from array1 that are not present in any of the other arrays."

So, you just check array1 against all arrayN and any values in array1 that don't appear in any of those arrays will be returned in a new array.

You don't necessarily even need to loop through all of array1's values. Just for all the additional arrays, loop through their values and check if each value is in_array($array1, $value).

animuson
-1 It is more complex than that. There are advanced algorithms and data structures being used. See pascal's answer.
Byron Whitlock
It is still a basic idea of what is happening and *is* a better solution than what he had.
animuson
@animuson:You're right.
SpawnCxy
+4  A: 

A better solution is to use hash maps

function my_array_diff($a, $b) {
    $map = $out = array();
    foreach($a as $val) $map[$val] = 1;
    foreach($b as $val) if(isset($map[$val])) $map[$val] = 0;
    foreach($map as $val => $ok) if($ok) $out[] = $val;
    return $out;
}

$a = array('A', 'B', 'C', 'D');
$b = array('X', 'C', 'A', 'Y');

print_r(my_array_diff($a, $b)); // B, D

benchmark

function your_array_diff($arraya, $arrayb)
{
    foreach ($arraya as $keya => $valuea)
    {
        if (in_array($valuea, $arrayb))
        {
            unset($arraya[$keya]);
        }
    }
    return $arraya;
}

$a = range(1, 10000);
$b = range(5000, 15000);

shuffle($a);
shuffle($b);

$ts = microtime(true);
my_array_diff($a, $b);
printf("ME =%.4f\n", microtime(true) - $ts);

$ts = microtime(true);
your_array_diff($a, $b);
printf("YOU=%.4f\n", microtime(true) - $ts);

result

ME =0.0137
YOU=3.6282

any questions? ;)

and, just for fun,

$ts = microtime(true);
array_diff($a, $b);
printf("PHP=%.4f\n", microtime(true) - $ts);

result

ME =0.0140
YOU=3.6706
PHP=19.5980

that's incredible!

stereofrog
Good job.But I think my edit version would be faster:)
SpawnCxy
see update.....
stereofrog
OOPS!!That's really incredible!
SpawnCxy
+1. I'm surprised that this is even faster, although unlike array_diff, index association is lost: array_keys(array_diff_key(array_fill_keys($a, 1), array_fill_keys($b, 1)))
chris