views:

841

answers:

2

As I was reading this (Find the most common entry in an array), the Boyer and Moore's Linear Time Voting Algorithm was suggested.

If you follow the link to the site, there is a step by step explanation of how the algorithm works. For the given sequence, AAACCBBCCCBCC it presents the right solution.

When we move the pointer forward over an element e:

  • If the counter is 0, we set the current candidate to e and we set the counter to 1.
  • If the counter is not 0, we increment or decrement the counter according to whether e is the current candidate.

When we are done, the current candidate is the majority element, if there is a majority.

If I use this algorithm on a piece of paper with AAACCBB as input, the suggested candidate would become B what is obviously wrong.

As I see it, there are two possibilities

  1. The authors have never tried their algorithm on anything else than AAACCBBCCCBCC, are completely incompetent and should be fired on the spot (That would make me smarter than both Boyer and Moore, yeah right).
  2. I am missing something, must get banned from Stackoverflow and never be allowed again to touch anything that involves logic.

Note: Here is a a C++ implementation of the algorithm from Niek Sanders. I believe he correctly implemented the idea and as such it has the same problem (or doesn't it?).

+3  A: 

The algorithm only works when at least half of the elements constitute the majority. AAACCBB in your example has no such majority. The most frequent letter occurs 3 times, the string length is 7.

Rafał Dowgird
Major "duh"
Lieven
Happens to everyone. Do not be too strict in carrying out point 2. from your answer :)
Rafał Dowgird
Took me a good couple of minutes :P
Blorgbeard
+1  A: 

From the first linked SO question:

with the property that more than half of the entries in the array are equal to N

From the Boyer and Moore page:

which element of a sequence is in the majority, provided there is such an element

Both of these algorithms explicitly assume that one element occurs at least N/2 times. (Note in particular that "majority" is not the same as "most common.")

j_random_hacker
+1. Close second. Can't believe I've overlooked that. Thanks.
Lieven
You're welcome :) It's easy to confuse the two concepts!
j_random_hacker