ansaurus

Question

Find the number in an array which occurred most number of times

Answer 1

+7 A:

You could maintain a current max (count and int value) as you iterate the values. On each increment in the map, you could update the values so you don't have to iterate at the end.

map<int, int> m;
int currentMax = -999;
int maxCount = 0;
for(int i=0;i<10;i++)
{
    int updated = m[a[i]]++;  //Increment the value of key for counting occurances        
    updated++; // due to post increment 
    if (maxCount < updated) {
         maxCount = updated;
         currentMax = i;
    }
}

Because this is an entertaining exercise, it seems we're all assuming that map iteration is cheap. While iterating a map is also O(N), it's much more expensive than iterating over a vector or array. So what's cheaper, iterating a possibly reduced size map, or doing a really basic if check that will trigger two assignments at some percentage? Both your solution and this one are O(N) assuming you change to use an unordered map. Right now you're not, so each insert is log(n) and actually I think switching to an unordered map will be your biggest gain at this point.

SB 2010-10-12 13:53:22

I was thinking the same thing. This could however be slower, depending on the list (ie a list of 100x3 and 40x1 would be 140 checks in your case, and only two in the original code)

Jochem 2010-10-12 13:57:07

@SB: +1: But i have a comment. If input array is too big with too many repetetions and if we add current max stuff in the initial loop, it wil slowdown the algorithm.

bjskishore123 2010-10-12 13:57:53

Slower than iterating over the map afterwards again? And doing exactly this? I don't think so.

Vinzenz 2010-10-12 13:58:34

Good points, but then you need to figure out what assumptions you can make about your data. For the average case, which is faster? You're still getting O(n) performance assuming your map is a hashmap

SB 2010-10-12 13:59:56

It is slower, Vinzenz, in the typical case. The reason is that the map in the typical case has less members than the input array.

ypnos 2010-10-12 14:02:56

Probably, this is worst than to use a cycle at the end. This way you do array-size additional comparisons in the main cycle while, cycling on map, you do (2 * map-size) comparisons (plus map overhead). Map-size is, in the worst case, equal to array-size. We'd know more on input size and distribution to determine which method is better.

andcoz 2010-10-12 14:10:31

I think map is implemented using the RBTree, so iterating of the map is O(logN) and not O(n).

Manoj R 2010-10-12 15:02:49

Answer 2

A:

First, you should get rid of the invalid number -999. Just ask for map.empty() before continuing.

Then, I think it is invalid to increment elements in the map that do not exist before. I assume that a new member is created with unitialized (random) value, as there is no default constructor for int.

You can do something else:

map<int, int>::iterator it = m.find(i);
if (it != m.end())
    m.second++;
    if (m.second > mostTimes) {
      // reset mostTimes and maxNumber = m.first here
    }
} else {
    m[i] = 1;
}

This operation is O(n) and therefore has the same time complexity class as iterating through the map again to find the max element (where in the worst case, all numbers in input are different and the map would have the same amount of members n than the input array). The difference though is that mostTimes and maxNumbers may get overwritten a lot of times and it may happen that they do not fit into CPU registers and a lot of RAM accesses happen. So probably doing the iteration afterwards is faster in practice.

ypnos 2010-10-12 13:58:28

@ypnos: I too got doubt and checked whether incrementing uninitialized key value works or not. Its working. Not sure whether it is implementation dependent.

bjskishore123 2010-10-12 14:02:13

@ypnos: I doubt doing the iteration afterwards will be faster in practice, especially on large lists. Of course, I'm not going to actually bother benchmarking it to see who is right :P

Brian 2010-10-12 14:05:41

NEVER trust this kind of test. If a new process is ran, it starts with memory filled with 0. If a process runs for longer, chances increase that your uninitialized memory becomes non-zero. Use a memory profiler like valgrind to see.

ypnos 2010-10-12 14:06:26

@Brian: Yes I would not bet on it either. But if you have long input with lots of repetition, the difference in size gets relevant.

ypnos 2010-10-12 14:07:29

@ypnos: I saw in map.h about operator[] implementation. _Where = this->insert(_Where, value_type(_Keyval, mapped_type()));This looks to be taking care of starting from zero. Not garbage value.

bjskishore123 2010-10-12 14:12:38

@ypnos: Adding to above point, mapped_type() acts like below. It is type passed to template class. If mapped_type is int, it works like this, int j = int(); j will be initialized to zero in the above statement.

bjskishore123 2010-10-12 14:20:11

Thank you. Very good to know.

ypnos 2010-10-12 18:46:42

Answer 3

A:

I think your solution is fine!

Jochem 2010-10-12 13:59:13

Answer 4

A:

Sort the array so you have...

{1,1,1,1,2,2,2,3,3,4,4}

Then have a currentValue variable and when the value doesn't match set it, when it does, increment the count... i.e. (pseudo code)

currentValue = 0;
currentCount = 0;
maxValue = 0;
maxCount = 0;

for(int value in array) {
  if(currentValue == value) {
    currentCount++;
  } else {
    // is this greater than max count
    if(currentCount > maxCount) {
      maxCount = currentCount;
      maxValue = currentValue;
    }

    // reset values
    currentValue = value;
    currentCount = 0;
  }
}

Now you have the value which occurred most in maxValue and the number of times it occurred in maxCount.

Simon Lee 2010-10-12 14:00:00

sorting makes it O(nlogn) (unless using count sort)

SB 2010-10-12 14:02:11

Answer 5

+4 A:

Your algorithm is pretty good. It's actually O(N Log N), because of the N std::map (a tree based map) insertions you're doing (Log N each). This dominates the time complexity of the algorithm, as the second phase is linear.

An improvement would be to use a hash map, giving you a linear algorithm overall.

Adhemar 2010-10-12 14:31:54

that's what i said ;)

SB 2010-10-12 14:37:15

Yes, after you edited it...

Adhemar 2010-10-12 14:43:33

actually my edit 23 minutes ago brought light to that fact.

SB 2010-10-12 14:50:24

+1 For pointing out that map is a treemap, not a hashmap.

Brian 2010-10-12 14:50:42

@SB, which edit do you mean? In any case, I didn't notice it then, and I still don't now. Sorry if I committed a faux pas here.

Adhemar 2010-10-12 15:05:34

no faux pas - happens all the time. i've been editing the answer like crazy. as long as your answer helps the poster, then its not a faux pas.

SB 2010-10-12 15:20:50

Answer 6

A:

linear iteration is required (as people have already mentioned), but given order is not important, you could fold the array? Doing so saves you potentially two map updates for elements that are the same, i.e.

int i = 0;
int j = sizeof(a)/sizeof(int);
for(;i < j;i++, j--)
{
  if (a[i] == a[j])
  {
    update<map_t, 2>(m, a[i]);
  }
  else
  {
    update<map_t, 1>(m, a[i]);
    update<map_t, 1>(m, a[j]);
  }
}
// if array size is odd...
if (i == j)
    update<map_t, 1>(m, a[i]);

here update is a simple function because I'm too lazy to type the same code...

template <typename M, int DEF>
void update(M& m, int v)
{
  typename M::iterator it = m.find(v);
  if (it != m.end())
      it->second += DEF;
  else
  {
    m.insert(pair<int, int>(v, DEF));
  }
}

everything else remains the same, i.e. your code is good, and only slight improvements are possible...

Nim 2010-10-12 14:47:28

ansaurus

tags:

views:

answers:

Find the number in an array which occurred most number of times

related questions