views:

355

answers:

4

What would be the smallest and largest number of comparisons in an unsorted array that could have duplicate elements as well ?

I understand that finding anything in an unsorted array is a O(n) problem. But, is this true if the array contains duplicate elements as well ?

By duplicate elements I mean, elements that occur more than once in the given array.

A: 

As a general rule of thumb, when we speak about asymptotic complexity that ignores constants like O(n) it doesn't matter whether you have twice as much work, three times as much work etc. Therefore the problem which is O(n) stays O(n) in such scenario.

In this particular problem, having duplicates in an unsorted array doesn't speed up the process of searching for an element. Of course, if the element is 10 times in the array, you would probably find it 10 times faster (on average), but as long as this doesn't depend on n, it doesn't change the complexity.

Thomas Wanner
+3  A: 

So the idea here is that you've got to walk the array from front to end because it is unsorted. That means you're looking at O(n) - a linear traversal of the elements. Regardless of whether the one you are searching for is at position 0, position 8, or position n-1, you've got to walk the array to find it.

Now, if there are possibly duplicates in the array, the only difference is that you may find more than one instance of the value. If you are looking for all of them or just the first one, it is still a O(n) situation. The duplicates don't change the complexity.

Best case - you find it (assuming you only need to find one) on the first comparison.

Worst case - there are no duplicates for the given value and it is the last one you check - the nth comparison.

If you had to find ALL of the duplicates, it is always going to be n comparisons, because you've got to visit each element in the unsorted array.

itsmatt
+2  A: 

The O(n) time is true even if there are duplicate elements. You should familiarize yourself with big-oh notation.

In the worst case, consider this array: 1, 1, 1, 1, ..., 1, 1, 2. Searching for 2 will take exactly n comparisons if you start from the first element, so having duplicates didn't help at all. If you were to search for 1, you'd find it in a single comparison, but there are inputs of distinct elements for which you can also find an element in a single comparison if you're lucky, so having duplicates really doesn't mean much, except that you're more likely to get lucky and find your target element in fewer steps. It will still be O(n) however.

There are almost always best cases and worst cases. The practical performance of most algorithms always depends on the given input, big-oh notation just gives you a vague idea of how the algorithm will perform. This isn't to say that asymptotic notation is useless, just that it's not always entirely accurate because there are underlying constants involved that do make a difference in practice.

If in doubt about performance, run your own benchmarks.

IVlad
A: 

What would be the smallest and largest number of comparisons in an unsorted array that could have duplicate elements as well ?

If you are searching for a single specified value, then the smallest and largest number of comparisions will be 1 and n, respectively. If the value is known to be in the array, and you are only seeking its location, then you can get away with n-1 comparisions.

I understand that finding anything in an unsorted array is a O(n) problem. But, is this true if the array contains duplicate elements as well ?

Yes, it is still O(n).

Suppose that the presence of duplicates means that, on average, the time searching is cut in half. Well, that's a great reduction, but it doesn't affect the O(n) time. Big-O isn't average case, it's worst case, and the worst case doesn't change. And division of n by a constant factor doesn't affect big-O time, anyway.

Jeffrey L Whitledge
@Jeffrey L Whitledge: Are you sure about "Big-O isn't average case, it's worst case"? From what I know Big-O isn't any case at all.
Lazer
@Lazer - Yes. Big-O is used to indicate an asymtotic upper-bound on a function (within a constant factor). If it were an average-case or best-case indicator, then it could not be an upper-bound.
Jeffrey L Whitledge