views:

578

answers:

5

Hi, I'm looking for an elegant way of determining which element has the highest occurrence in a javascript array.

For example in

['pear', 'apple', 'orange', 'apple']

the 'apple' element is the most frequent one.

+4  A: 

This is just the mode. Here's a quick, non-optimized solution. It should be O(n).

function mode(array)
{
    if(array.length == 0)
     return null;
    var modeMap = {};
    var maxEl = array[0], maxCount = 1;
    for(var i = 0; i < array.length; i++)
    {
     var el = array[i];
     if(modeMap[el] == null)
      modeMap[el] = 1;
     else
      modeMap[el]++; 
     if(modeMap[el] > maxCount)
     {
      maxEl = el;
      maxCount = modeMap[el];
     }
    }
    return maxEl;
}
Matthew Flaschen
Nice... but it only works for strings - not necessarily a limitation but something to consider.
J-P
oops, (and numbers)
J-P
Many thanks, I wasn't expecting a complete solution.Works on both strings and numbers with only a single pass which is quite nice.
vise
I've added a version of this algorithm to handle ties.
samandmoore
+2  A: 
a=['pear', 'apple', 'orange', 'apple'];
b=[];
max='', maxi=0;
for(var k in a) {
  if(b[k]) b[k]++ else b[k]=1;
  if(maxi<b[k]) { max=k; maxi=b[k] }
}
Thinker
This is still O(n), but it unnecessarily uses two passes.
Matthew Flaschen
Since JavaScript is transmitted, it's always interesting to see small solutions.
Nosredna
Lol 2 minus for proper solution ;] I corrected unnecessarily two passes, was making it quick, but still it works and is still the shortest solution.
Thinker
each access to b takes at least log(len(b)) so O(n) might be a bit optimistic
Nicolas78
nicolas78: If the array is small, it doesn't matter. So it depends on your project.
Thinker
A: 

var mode = 0; var c = 0;
var num = new Array(); var value = 0;
var greatest = 0; var ct = 0;

Note: ct is the length of the array

function getMode()
{

    for (var i = 0; i < ct; i++)
    {
        value = num[i];
        if (i != ct)
        {
            while (value == num[i + 1])
            {
                c = c + 1; 
                i = i + 1;
            }
        }
        if (c > greatest)
        {
            greatest = c;
            mode = value;
        }
        c = 0;
    }

}
Kingsley Nnoruka
+2  A: 

As per George Jempty's request to have the algorithm account for ties, I propose a modified version of Matthew Flaschen's algorithm.

function modeString(array)
{
    if(array.length == 0)
        return null;
    var modeMap = {};
    var maxEl = array[0],  maxCount = 1;
    for(var i = 0; i < array.length; i++)
    {
        var el = array[i];
        if(modeMap[el] == null)
            modeMap[el] = 1;
        else
            modeMap[el]++;  
        if(modeMap[el] > maxCount)
        {
            maxEl = el;
            maxCount = modeMap[el];
        }
        else if(modeMap[el] == maxCount)
        {
            maxEl += '&' + el;
            maxCount = modeMap[el];
        }
    }
    return maxEl;
}

This will now return a string with the mode element(s) delimited by a '&' symbol. When the result is received it can be split on that '&' element and you have your mode(s).

Another option would be to return an array of mode element(s) like so:

function modeArray(array)
{
    if(array.length == 0)
        return null;
    var modeMap = {};
    var maxCount = 1, modes = [array[0]];
    for(var i = 0; i < array.length; i++)
    {
        var el = array[i];
        if(modeMap[el] == null)
            modeMap[el] = 1;
        else
            modeMap[el]++;  
        if(modeMap[el] > maxCount)
        {
            modes = [el];
            maxCount = modeMap[el];
        }
        else if(modeMap[el] == maxCount)
        {
            modes.push(el);
            maxCount = modeMap[el];
        }
    }
    return modes;
}

In the above example you would then be able to handle the result of the function as an array of modes.

samandmoore
A: 

I guess you have two approaches. Both of which have advantages.

Sort then Count or Loop through and use a hash table to do the counting for you.

The hashtable is nice because once you are done processing you also have all the distinct elements. If you had millions of items though, the hash table could end up using a lot of memory if the duplication rate is low. The sort, then count approach would have a much more controllable memory footprint.

Steve Sheldon