views:

1860

answers:

8

You are given a 32-bit unsigned integer array with length up to 232, with the property that more than half of the entries in the array are equal to N, for some 32-bit unsigned integer N. Find N looking at each number in the array only once and using at most 2 kB of memory.

Your solution must be deterministic, and guaranteed to find N.

+16  A: 

Keep one integer for each bit, and increment this collection appropriately for each integer in the array.

At the end, some of the bits will have a count higher than half the length of the array - those bits determine N. Of course, the count will be higher than the number of times N occurred, but that doesn't matter. The important thing is that any bit which isn't part of N cannot occur more than half the times (because N has over half the entries) and any bit which is part of N must occur more than half the times (because it will occur every time N occurs, and any extras).

(No code at the moment - about to lose net access. Hopefully the above is clear enough though.)

Jon Skeet
I didn't care about the question until I read your answer. Jon Skeet I salute you.
Arj
Boyer-Moore is better, but this is really nice too.
wrang-wrang
+2  A: 

Pseudo code (notepad C++ :-)) for Jon's algorithm:

int lNumbers = (size_of(arrNumbers)/size_of(arrNumbers[0]);

for (int i = 0; i < lNumbers; i++)
  for (int bi = 0; bi < 32; bi++)
    arrBits[i] = arrBits[i] + (arrNumbers[i] & (1 << bi)) == (1 << bi) ? 1 : 0;

int N = 0;

for (int bc = 0; bc < 32; bc++)
  if (arrBits[bc] > lNumbers/2)
    N = N | (1 << bc);
Franci Penov
Lots of problems with your arithmetic and [default] signed int types if the array is very large.
Sparr
Funny, I thought I mentioned this is a pseudo-code...
Franci Penov
A: 

I have recollections of this algorithm, which might or might not follow the 2K rule. It might need to be rewritten with stacks and the like to avoid breaking the memory limits due to function calls, but this might be unneeded since it only ever has a logarithmic number of such calls. Anyhow, I have vague recollections from college or a recursive solution to this which involved divide and conquer, the secret being that when you divide the groups in half, at least one of the halves still has more than half of its values equal to the max. The basic rule when dividing is that you return two candidate top values, one of which is the top value and one of which is some other value (that may or may not be 2nd place). I forget the algorithm itself.

Brian
A: 

I'm not answering the question, but how can you make practical constraints

using at most 2 kB

while seeking the solution be language agnostic?

(I know why you're adding that constraint, you don't just want us to keep a keyed array of basic tallies for each possible integer, but just commenting on the inconsistency.)

Unsliced
I presume the contraint isn't meant to include the overhead of the language itself.
Cameron Price
Well, writing a solution which only uses the heap space using C++ template programming is quite possible. Don't expect it to compile or run, though. But a theoretical solution which needs no (excluding 'overhead from language') memory at all using (2^32)^32 LOC is quite possible.
Brian
+16  A: 

Boyer and Moore's "Linear Time Majority Vote Algorithm" - go down the array maintaining your current guess at the answer.

buti-oxa
A rather interesting algorithm. Same complexity as the accepted solution, O(n) that is, but with significantly lower runtime due to less math.
Sparr
Great algorithm. I found it amazing that 1) my one-line sentence actually describes is fully and 2) the sentence looks like a first-stab naive attempt at the algoritm, an attempt that cannot work.
buti-oxa
+1  A: 

You are guaranteed that the majority is strictly more than half the number of elements. At each step, maintain a best-estimate on the majority item, together with a count of the number of times it has been "voted" for, minus the number of times it has been "voted" against. When the vote reaches 0, then you change to a new guess. The majority element is guaranteed to be the last one standing, and the "count" is the number of times it appears in the array in excess of the majority.

   unsigned get_majority(const unsigned value[], unsigned N) {
      unsigned guess, i;
      unsigned count_guess = 0;
      for (i = 0; i < N; ++i) {
        if (count_guess == 0) {
          guess = value[i];
          ++count_guess;
        } else {
          count_guess += (value[i] == guess) ? 1 : -1;
        }
      }
      return guess;
   }
David Nehme
This doesn't work. EG, pass in {2,1,3,1,4,1} and get 4.
RossFabricant
+3  A: 

You can do this with only two variables.

public uint MostCommon(UInt32[] numberList)
{
    uint suspect = 0;
    int suspicionStrength = -1; 
    foreach (uint number in numberList)
    {
        if (number==suspect)
        {
            suspicionStrength++;
        }
        else
        {
            suspicionStrength--;
        }

        if (suspicionStrength<=0)
        {
            suspect = number;
        }
    }
    return suspect;
}

Make the first number the suspect number, and continue looping through the list. If the number matches, increase the suspicion strength by one, if it doesn't match lower the suspicion strength by one. If the suspicion strength hits 0 the current number becomes the suspect number. This will not work to find the most common number, only a number that is more than 50% of the group. Resist the urge to add a check if suspsicionStrenght is greater than half the list length- it will always result in more total comparisons.

-Jason

P.S. I have not tested this code- Use it at your own peril.

Jason Hernandez
The code fails with [1,1,1,1] because it returns 0
Lukman
You may have commented before I reformatted, I changed "if (suspicionStrength=0)" to if "(suspicionStrength<=0)". It passes that test as it is now.
Jason Hernandez
A: 

Proof of correctness for buti-oxa / Jason Hernandez's answer, assuming Jason's answer is the same as buti-oxa's answer and both work the way the algorithm described should work:

We define adjusted suspicion strength as being equal to suspicion strength if top value is selected or -suspicion strength if top value is not selected. Every time you pick the right number, the current adjusted suspicion strength increases by 1. Each time you pick a wrong number, it either drops by 1 or increases by 1, depending on if the wrong number is currently selected. So, the minimum possible ending adjusted suspicion strength is equal to number-of[top values] - number-of[other values]

Brian