tags:

views:

799

answers:

7

Given an array of n integers, where one element appears more than n/2 times. We need to find that element in linear time and constant extra space.

YAAQ: Yet another arrays question.

A: 

My first thought (not sufficient) would be to:

  • Sort the array in place
  • Return the middle element

But that would be O(n log n), as would any recursive solution.

If you can destructively modify the array (and various other conditions apply) you could do a pass replacing elements with their counts or something. Do you know anything else about the array, and are you allowed to modify it?

Edit Leaving my answer here for posterity, but I think Skeet's got it.

MarkusQ
+10  A: 

I have a sneaking suspicion it's something along the lines of (in C#)

// We don't need an array
public int FindMostFrequentElement(IEnumerable<int> sequence)
{
    // Initial value is irrelevant if sequence is non-empty,
    // but keeps compiler happy.
    int best = 0; 
    int count = 0;

    foreach (int element in sequence)
    {
        if (count == 0)
        {
            best = element;
            count = 1;
        }
        else
        {
            // Vote current choice up or down
            count += (best == element) ? 1 : -1;
        }
    }
    return best;
}

It sounds unlikely to work, but it does. (Proof as a postscript file, courtesy of Boyer/Moore.)

Jon Skeet
Cool! It works because (by induction) the element you're looking for would be the same as that found given a smaller array where tails consisting of a smaller number of other elements had been removed.
MarkusQ
This should work because eventually you'll have to get repeated items,i.e it should work with {...random.... ,1,1,1,1,1,1..} and with the worst case of {1,a,1,b,1,c,1,d.. }
Steve B.
Hmm, this doesn't even work? Try: int[] wrongwrongwrong = {4, 2, 2, 3, 2, 4, 3, 2, 6, 4};
Blankman
@Blankman: Hmm... odd. I'll look into fixing it. Thanks for pointing it out.
Jon Skeet
@Blankman: Oh, I've worked it out. In your example, no element appears more than n/2 times. My implementation doesn't validate that the input matches the question description.
Jon Skeet
Yes :) it gets tricky, b/c then you have to keep some sort of a history of the 'best' counter.
Blankman
A: 

How about: randomly select a small subset of K elements and look for duplicates (e.g. first 4, first 8, etc). If K == 4 then the probability of not getting at least 2 of the duplicates is 1/8. if K==8 then it goes to under 1%. If you find no duplicates repeat the process until you do. (assuming that the other elements are more randomly distributed, this would perform very poorly with, say, 49% of the array = "A", 51% of the array ="B").

e.g.:

findDuplicateCandidate: 
    select a fixed size subset.
    return the most common element in that subset
    if there is no element with more than 1 occurrence repeat.
    if there is more than 1 element with more than 1 occurrence call findDuplicate and choose the element the 2 calls have in common

This is a constant order operation (if the data set isn't bad) so then do a linear scan of the array in order(N) to verify.

Steve B.
+1  A: 

Well you can do an inplace radix sort as described here[pdf] this takes no extra space and linear time. then you can make a single pass counting consecutive elements and terminating at count > n/2.

MahdeTo
+1  A: 
int findLeader(int n, int* x){
    int leader = x[0], c = 1, i;
    for(i=1; i<n; i++){
     if(c == 0){
      leader = x[i];
      c = 1;
     } else {
      if(x[i] == leader) c++;
      else c--;
     }
    }

    if(c == 0) return NULL;
    else {
     c = 0;
     for(i=0; i<n; i++){
      if(x[i] == leader) c++;
     }
     if(c > n/2) return leader;
     else return NULL;
    }
}

I'm not the author of this code, but this will work for your problem. The first part looks for a potential leader, the second checks if it appears more than n/2 times in the array.

Pyetras
+3  A: 

Find the median, it takes O(n) on an unsorted array. Since more than n/2 elements are equal to the same value, the median is equal to that value as well.

A: 

This is what I thought initially.

I made an attempt to keep the invariant "one element appears more than n/2 times", while reducing the problem set.

Lets start comparing a[i], a[i+1]. If they're equal we compare a[i+i], a[i+2]. If not, we remove both a[i], a[i+1] from the array. We repeat this until i>=(current size)/2. At this point we'll have 'THE' element occupying the first (current size)/2 positions. This would maintain the invariant.

The only caveat is that we assume that the array is in a linked list [for it to give a O(n) complexity.]

What say folks?

-bhupi

baskin