views:

310

answers:

6

Hi, Given an integer array, i need to find which number occurred most number of times. I have written algorithm as below.

  1. Use a map to store number and number of times it occurred.

    map<int, int>

    Key: represents the number
    value: represents number of times key occurred.

  2. Scan input array and update the map with number and number of occurances.
  3. Iterate through map from the begining to end. Find the key for which maximum value is present. This key becomes the number which occurred most number of times.

I implemented the algorithm as below.

#include <iostream> 
#include <map>
using namespace std; 
int main()
{
    int a[10] = {1,2,3,2,1,3,2,4,1,1}; //Input array: hardcoded for testing
    map<int, int> m;

    for(int i=0;i<10;i++)
    {
        m[a[i]]++;  //Increment the value of key for counting occurances
    }

    int mostNumTimes = 0; 
    int number = -999; //-999 represents invalid number
    map<int,int>::iterator it = m.begin();
    for( ;it != m.end(); it++)  //Find the number which occurred 
    {                           //most number of times
        if(it->second > mostNumTimes)
        {
            mostNumTimes = it->second;
            number = it->first;
        }
    }
    if(number != -999)   //Print number and number of times it occurred
    {
        cout<<"Number: "<<number<<endl;
        cout<<"Number of times occured: "<<mostNumTimes<<endl;
    }
    else
    {
        cout<<"Input array is empty"<<endl;
    }
    return 0;
}

Output:

Number: 1

Number of times occured: 4

Question: Are there any optimal ways of doing it ? In the end, i am iterating through entire map as I couldn't find any member function for finding key whose value is highest in map. can i avoid iterating all keys ?

Note: I am not considering if multiple numbers occuring same number of times. I am finding first number with most occurances.

+7  A: 

You could maintain a current max (count and int value) as you iterate the values. On each increment in the map, you could update the values so you don't have to iterate at the end.

map<int, int> m;
int currentMax = -999;
int maxCount = 0;
for(int i=0;i<10;i++)
{
    int updated = m[a[i]]++;  //Increment the value of key for counting occurances        
    updated++; // due to post increment 
    if (maxCount < updated) {
         maxCount = updated;
         currentMax = i;
    }
}

Because this is an entertaining exercise, it seems we're all assuming that map iteration is cheap. While iterating a map is also O(N), it's much more expensive than iterating over a vector or array. So what's cheaper, iterating a possibly reduced size map, or doing a really basic if check that will trigger two assignments at some percentage? Both your solution and this one are O(N) assuming you change to use an unordered map. Right now you're not, so each insert is log(n) and actually I think switching to an unordered map will be your biggest gain at this point.

SB
I was thinking the same thing. This could however be slower, depending on the list (ie a list of 100x3 and 40x1 would be 140 checks in your case, and only two in the original code)
Jochem
@SB: +1: But i have a comment. If input array is too big with too many repetetions and if we add current max stuff in the initial loop, it wil slowdown the algorithm.
bjskishore123
Slower than iterating over the map afterwards again? And doing exactly this? I don't think so.
Vinzenz
Good points, but then you need to figure out what assumptions you can make about your data. For the average case, which is faster? You're still getting O(n) performance assuming your map is a hashmap
SB
It is slower, Vinzenz, in the typical case. The reason is that the map in the typical case has less members than the input array.
ypnos
Probably, this is worst than to use a cycle at the end. This way you do array-size additional comparisons in the main cycle while, cycling on map, you do (2 * map-size) comparisons (plus map overhead). Map-size is, in the worst case, equal to array-size. We'd know more on input size and distribution to determine which method is better.
andcoz
I think map is implemented using the RBTree, so iterating of the map is O(logN) and not O(n).
Manoj R
A: 

First, you should get rid of the invalid number -999. Just ask for map.empty() before continuing.

Then, I think it is invalid to increment elements in the map that do not exist before. I assume that a new member is created with unitialized (random) value, as there is no default constructor for int.

You can do something else:

map<int, int>::iterator it = m.find(i);
if (it != m.end())
    m.second++;
    if (m.second > mostTimes) {
      // reset mostTimes and maxNumber = m.first here
    }
} else {
    m[i] = 1;
}

This operation is O(n) and therefore has the same time complexity class as iterating through the map again to find the max element (where in the worst case, all numbers in input are different and the map would have the same amount of members n than the input array). The difference though is that mostTimes and maxNumbers may get overwritten a lot of times and it may happen that they do not fit into CPU registers and a lot of RAM accesses happen. So probably doing the iteration afterwards is faster in practice.

ypnos
@ypnos: I too got doubt and checked whether incrementing uninitialized key value works or not. Its working. Not sure whether it is implementation dependent.
bjskishore123
@ypnos: I doubt doing the iteration afterwards will be faster in practice, especially on large lists. Of course, I'm not going to actually bother benchmarking it to see who is right :P
Brian
NEVER trust this kind of test. If a new process is ran, it starts with memory filled with 0. If a process runs for longer, chances increase that your uninitialized memory becomes non-zero. Use a memory profiler like valgrind to see.
ypnos
@Brian: Yes I would not bet on it either. But if you have long input with lots of repetition, the difference in size gets relevant.
ypnos
@ypnos: I saw in map.h about operator[] implementation. _Where = this->insert(_Where, value_type(_Keyval, mapped_type()));This looks to be taking care of starting from zero. Not garbage value.
bjskishore123
@ypnos: Adding to above point, mapped_type() acts like below. It is type passed to template class. If mapped_type is int, it works like this, int j = int(); j will be initialized to zero in the above statement.
bjskishore123
Thank you. Very good to know.
ypnos
A: 

I think your solution is fine!

Jochem
A: 

Sort the array so you have...

{1,1,1,1,2,2,2,3,3,4,4}

Then have a currentValue variable and when the value doesn't match set it, when it does, increment the count... i.e. (pseudo code)

currentValue = 0;
currentCount = 0;
maxValue = 0;
maxCount = 0;

for(int value in array) {
  if(currentValue == value) {
    currentCount++;
  } else {
    // is this greater than max count
    if(currentCount > maxCount) {
      maxCount = currentCount;
      maxValue = currentValue;
    }

    // reset values
    currentValue = value;
    currentCount = 0;
  }
}

Now you have the value which occurred most in maxValue and the number of times it occurred in maxCount.

Simon Lee
sorting makes it O(nlogn) (unless using count sort)
SB
+4  A: 

Your algorithm is pretty good. It's actually O(N Log N), because of the N std::map (a tree based map) insertions you're doing (Log N each). This dominates the time complexity of the algorithm, as the second phase is linear.

An improvement would be to use a hash map, giving you a linear algorithm overall.

Adhemar
that's what i said ;)
SB
Yes, after you edited it...
Adhemar
actually my edit 23 minutes ago brought light to that fact.
SB
+1 For pointing out that map is a treemap, not a hashmap.
Brian
@SB, which edit do you mean? In any case, I didn't notice it then, and I still don't now. Sorry if I committed a faux pas here.
Adhemar
no faux pas - happens all the time. i've been editing the answer like crazy. as long as your answer helps the poster, then its not a faux pas.
SB
A: 

linear iteration is required (as people have already mentioned), but given order is not important, you could fold the array? Doing so saves you potentially two map updates for elements that are the same, i.e.

int i = 0;
int j = sizeof(a)/sizeof(int);
for(;i < j;i++, j--)
{
  if (a[i] == a[j])
  {
    update<map_t, 2>(m, a[i]);
  }
  else
  {
    update<map_t, 1>(m, a[i]);
    update<map_t, 1>(m, a[j]);
  }
}
// if array size is odd...
if (i == j)
    update<map_t, 1>(m, a[i]);

here update is a simple function because I'm too lazy to type the same code...

template <typename M, int DEF>
void update(M& m, int v)
{
  typename M::iterator it = m.find(v);
  if (it != m.end())
      it->second += DEF;
  else
  {
    m.insert(pair<int, int>(v, DEF));
  }
}

everything else remains the same, i.e. your code is good, and only slight improvements are possible...

Nim