views:

368

answers:

5

I'm sure this is an extremely obvious question, and that there's a function that does exactly this, but I can't seem to find it. In PHP, I'd like to know if my array has duplicates in it, as efficiently as possible. I don't want to remove them like array_unique does, and I don't particularly want to run array_unique and compare it to the original array to see if they're the same, as this seems very inefficient. As far as performance is concerned, the "expected condition" is that the array has no duplicates.

I'd just like to be able to do something like

if (no_dupes($array))
    // this deals with arrays without duplicates
else
    // this deals with arrays with duplicates

Is there any obvious function I'm not thinking of?
http://stackoverflow.com/questions/1170807/how-to-detect-duplicate-values-in-php-array
is a very similar question, however if you read the question, he's looking for array_count_values.

A: 

php's in_array() might do it: http://php.net/manual/en/function.in-array.php

stormdrain
not really... That would tell me if a particular element is in my array at least once, which it would be for every element I would test
Mala
I was thinking like in conjunction with a counter akin to what Mike Sherov proposed.
stormdrain
A: 

Two ways to do it efficiently that I can think of:

  1. inserting all the values into some sort of hashtable and checking whether the value you're inserting is already in it(expected O(n) time and O(n) space)

  2. sorting the array and then checking whether adjacent cells are equal( O(nlogn) time and O(1) or O(n) space depending on the sorting algorithm)

stormdrain's solution would probably be O(n^2), as would any solution which involves scanning the array for each element searching for a duplicate

Bwmat
+7  A: 

I know you said you aren't after array_unique. However, I don't think you're going to find a magical function. I also don't think building one is going to be faster that making use of the native functions. So here's what I propose:

function array_has_dupes($array) {
   // streamline per @Felix
   return count($array) != count(array_unique($array));
}

Adjust the second parameter of array_unique() to your liking.

Jason McCreary
thanks for the suggestion. My thought at finding a better algorithm is simply that, technically speaking, once you've finished running whatever the builtin `array_unique` does, you should be able to know if there were any dupes. Thus, anything that does at least as much work as `array_unique` does more work than necessary. Although yeah, if such a function doesn't exist, I don't particularly feel like writing it.
Mala
If you only care about if it has dupes, then that's what I would do. If you care about more than just if it has dupes, then you're right, the above may do more work than it needs. Anything you write is going to be O(n^2). Even if you bail out early. As you said, it's not common that you have dupes. So is it worth your time to make something magical?
Jason McCreary
Magical? Sure it's a microoptimization, but it's not "magic" to write your own function, and I'm not sure it's that a better solution is that much harder to write than this.
Mike Sherov
@Mike, never said anything of the sort. Just questioning if it was worth the developer's time in this particular instance based on the specification of the original question. No worries.
Jason McCreary
+1 simple and easy to understand.
alex
@Jason, I wasn't trying to come off harsh. I was just trying to convey that it doesn't always hurt to seek your own solution when you already know the easy way.
Mike Sherov
@Mike. Agreed. And ultimately it's how we all get better. I was just devil's advocate, in this case, saying that there's nothing wrong with the *easy way*.
Jason McCreary
@Jason, yup. I too have been bitten before by premature optimization.
Mike Sherov
+4  A: 

You can do:

function has_dupes($array){
 $dupe_array = array();
 foreach($array as $val){
  if(++$dupe_array[$val] > 1){
   return true;
  }
 }
 return false;
}
Mike Sherov
Where is `$dupe_array` defined? Shouldn't you define it first to avoid warnings?
alex
It's implicitly defined, but I'll edit the answer for clarity.
Mike Sherov
Thank you, this is what I had in mind :)
Mala
Good stuff, :) +1
alex
I like it! Just keep in mind that even with an early `return` this is an O(n) function. In addition to the overhead of `foreach` and tracking `$dupe_array`, I'd love to see some benchmarking. I'd guess for array's with no duplicates, utilizing native functions is faster. Definitely better than O(n^2) though. Nice.
Jason McCreary
Has a little problem: only works correctly if the values are strings or numbers.
Artefacto
A: 

As you specifically said you didn't want to use array_unique I'm going to ignore the other answers despite the fact they're probably better.

Why don't you use array_count_values() and then check if the resulting array has any value greater than 1?