ansaurus

Question

How to find the most common int in a 2d array of ints?

Answer 1

+6 A:

It's at least O(n*m) any way you slice it -- you are going to have to look at each cell at least once. The place to economize is in where you accumulate the counts of each value before looking for the most common; if your integers vary over a relatively small range (they are uint16, let's say), then you might be able to simply use a flat array instead of a map.

I guess you could also keep a running count x,y of the current top and second-closest candidate for "most common" and early-out as soon as you've less than (n*m)-(x-y) cells left to look at, since at that point there's no way the runner-up could outpace the top candidate.

Integer ops like this are pretty fast; even for a megapixel image the brute force algorithm should only take a couple milliseconds.

I notice you've edited your original question to say that the pixels value from 0..255 -- in that case, definitely go with a simple flat array; that's small enough to easily fit into the l1 dcache and a lookup in a flat array is trez quick.

[edit] : Dealing with the "no most common number" case is very simple once you've built the histogram array: all have you to do is walk through it to find the "most" and "second most" common numbers; if they're equally frequent, then by definition there is no one most common.

const int numLevels = 360; // you said each cell contains a number [0..360)
int levelFrequencyCounts[numLevels]; // assume this has been populated such that levelFrequencyCounts[i] = number of cells containing "i"
int mostCommon = 0, runnerUp = 0;
for (int i = 1 ; i < numLevels ; ++i)
{
  if ( levelFrequencyCounts[i] > levelFrequencyCounts[mostCommon] )
  {
    runnnerUp = mostCommon;
    mostCommon = i;
  }
}

if ( levelFrequencyCounts[mostCommon] != levelFrequencyCounts[runnerUp] )
{
   return mostCommon;
}
else
{
   return CenterOfInputData; // (something like InputData[n/2][m/2])
}

Crashworks 2009-01-29 19:04:38

You will be able to determine your outcome IF the count of any key becomes greater than 1/2 the available cells. On small areas it won't matter much though.

Joe Philllips 2009-01-29 19:05:42

@Crashworks, my mistake... the values are 0-360.

BoltBait 2009-01-29 19:13:40

Still small enough to easily fit in an array!

Crashworks 2009-01-29 19:17:26

Answer 2

+3 A:

how would I do something like this in C#?

Something like this:

Dictionary<int, int> d = new Dictionary<int, int>();
foreach (int value in matrix)
{
 if (!d.ContainsKey(value))
  d.Add(value, 1);
 else
  d[value] = d[value] + 1;
}
KeyValuePair<int, int> biggest = null;
foreach (KeyValuePair<int, int> found in d)
{
  if ((biggest == null) || (biggest.Value < found.Value))
    biggest = found;
}

ChrisW 2009-01-29 19:06:59

Thanks! Nice code. Easy to follow.

BoltBait 2009-01-29 19:11:13

Given that the values are constrained to 0..360, an array like `int numOccurances[360]` could be simpler and cheaper than a dictionary.

ChrisW 2009-01-29 19:20:28

Answer 3

+1 A:

One option is LINQ - a bit inefficient, but OK for non-huge arrays:

    var max = (from cell in data.Cast<int>()
               group cell by cell into grp
               select new { Key = grp.Key, Count = grp.Count() } into agg
               orderby agg.Count descending
               select agg).First();
    Console.WriteLine(max.Key + ": " + max.Count);

Or with a jagged array:

    var max = (from row in data
              from cell in row
              group cell by cell into grp
              select new {Key = grp.Key, Count = grp.Count()} into agg
              orderby agg.Count descending
              select agg).First();
    Console.WriteLine(max.Key + ": " + max.Count);

In reality, I would probably use a dictionary/count. This example without LINQ, just "because":

    Dictionary<int, int> counts = new Dictionary<int, int>();
    foreach (int value in data)
    {
        int count;
        counts.TryGetValue(value, out count);
        counts[value] = count + 1;
    }
    int maxCount = -1, maxValue = 0;
    foreach (KeyValuePair<int, int> pair in counts)
    {
        if (pair.Value > maxCount)
        {
            maxCount = pair.Value;
            maxValue = pair.Key;
        }
    }
    Console.WriteLine(maxCount + ": " + maxValue);

Marc Gravell 2009-01-29 19:07:37

Answer 4

+1 A:

If speed is your primary concern, do not use a dictionary. Stick with an array of bytes. Try this:

// stores hit counts (0-360)
short[] hitCounts = new short[361];

// iterate through 2d array and increment hit counts
for (int i = 0; i < toEvaluate.Length; i++)
{
    for (int j = 0; j < toEvaluate[i].Length; j++)
        hitCounts[toEvaluate[i][j]]++;
}

int greatestHitCount = 0; // the hit count of the current greatest value
int greatest = -1; // the current greatest valeu

// iterate through values (0-360) and evalute hit counts
for (int i = 0; i < hitCounts.Length; i++)
{
    // the hit count of hitCounts[i] is higher than the current greatest hit count value
    if (hitCounts[i] > greatestHitCount)
    {
        greatestHitCount = vals[i]; // store the new hit count
        greatest = i; // store the greatest value
    }
    // there is already a value with the same hit count (which is the greatest)
    else if (hitCounts[i] == greatestHitCount)
        greatest = -1; // there are more than one value, we can't use this if it ends up being the greatest
}

if (greatest >= 0) // no greatest value found
    return greatest;

// figure out the middle x and y value
int x = (toEvaluate.Length - 1) / 2 + 1;
int y = (toEvaluate[x].Length - 1) / 2 + 1;

// return the value at the center of the 2d array as the value
return toEvaluate[x][y];

When speed becomes a concern over readability, you end up with necessarily ugly code. The above could definitely benefit from refactoring (hence overdoing the comments), but it should run fast. If it isn't fast enough, you can gain even more optimizations by moving it to unmanaged code.

Michael Meadows 2009-01-29 19:54:30

Hey - this is nearly identical to what I was about to post!

Michael Burr 2009-01-29 20:03:23

Just a couple nitpicks - since the incoming matrix can be up to 17x17, the vals[] array type needs to be larger than a byte to avoid possible overflow. Also assuming that 360 is a valid data value, the array should be sized to 361 (obviously, the array size should be a constant defined somewhere).

Michael Burr 2009-01-29 20:09:11

vals[] represents the hit counts (for your values 0-360), not the actual values. 17*17 is 289. The max value of a byte (which is unsigned by default) is 512. You're right about the 360 thing. Will fix.

Michael Meadows 2009-01-29 20:21:52

Hmm - my bytes can only hold 255. I guess I need to upgrade. :)

Michael Burr 2009-01-29 20:25:17

That makes me wrong twice! Yar. I guess eventually 2^8 starts looking like 2^9 to my addled brain. Changing to short.

Michael Meadows 2009-01-30 13:55:24

Answer 5

A:

Michael beat me to the post, but I'd do likewise, something like this:

        int MaxValueIn2dArray(int[,] matrix)
    {
        var d = new int[360];
        int MaxValue = 0;
        for (int x = 0; x <= matrix.GetUpperBound(0); x++)
        {
            for (int y = 0; y <= matrix.GetUpperBound(1); y++)
            {
                d[matrix[x, y]]++;
            }
        }
        foreach (int value in d)
        {
            if (value > MaxValue) MaxValue = value;
        }
        return MaxValue;
    }

It would need to be optimized for your particular needs.

pro3carp3 2009-01-29 20:03:37

Answer 6

+1 A:

Your image:

300+ 300+ 300+ 300 300 300 300
  0+ 150+ 300+ 300 300 300 300
  0+   0+ 150+ 300 300 300 300
  0    0    0    0 300 300 300
  0    0    0    0 150 300 300
  0    0    0    0   0 150 300
  0    0    0    0   0   0 300

Marked (+) numbers are your window. w,h is your window dimensions. Apply bucket sorting (as other people suggested since your value ranges are quite limited). Don't cut your evaluation halfway as Crashworks suggests. Don't throw your result yet. This is the first step.

300- 300- 300- 300 300 300 300
  0. 150. 300. 300 300 300 300
  0.   0. 150. 300 300 300 300
  0+   0+   0+   0 300 300 300
  0    0    0    0 150 300 300
  0    0    0    0   0 150 300
  0    0    0    0   0   0 300

Shift your window. Instead of adding, subtract the buckets in the last row/column you passed and add the new buckets. This way you examine each pixel 2(w+h) times i.e. when it crosses the window boundary, instead of w*h times, i.e. while that pixel is in the window, in a naive implementation.

In other words, You need to move your window like this:

|  ^->|  ^
|  |  |  |
|  |  |  |
V->|  V->|

I assume you are trying to implement a nonlinear convolution filter.

Corrections welcome.

artificialidiot 2009-01-29 22:43:29

I will be processing many pixels in a row, so I had already thought of the optimization of removing the left column of data from the array before adding the new right most column. I have done similar shortcuts in previous Paint.NET plugins.

BoltBait 2009-01-29 23:27:48

well... just in case...

artificialidiot 2009-01-29 23:46:23

Yeah, exploiting the data locality like that is a good call. I wrote my answer before Bolt made his edit about sliding a convo filter over an image -- at the time, I thought he wanted to find the mode value over the whole array.

Crashworks 2009-01-31 10:24:08

Answer 7

A:

All I'll offer is for any algorithm that checks every cell (which is pretty much what you'd expect to do) do two extra things:

1.) Make sure the routine exits when the count for the currently most common value > (M x N / 2). If something has >50% coverage on your grid then it's the most common value, no need to continue. If your routine only needs to be right MOST of the time then you could lower the percentage and treat it as a heuristic. You could even run some analysis that spits out something like if coverage is >37.6% then 99.9% of the time it'll be the most common value and then use that percentage.

2.) If there is any way you can determine in which side, corner or general location (outer edges, middle, etc.) the most common values are likely to be, you could then scan in that order which together with optimization 1 above could shave off a lot of your scanning. For instance in your example the top right is heavy on the common value. If this was determinable by some heuristic you could scan from the top right to the bottom left in some fashion. If the pattern of scanning needed is complex, pre-generate it.

pbhogan 2009-02-20 17:22:15

Answer 8

A:

Take a look at the LocalHistogramEffect code in Paint.NET, notably LocalHistorgramEffect.RenderRect.

I walks the input image, maintaining a histogram of intensities for each source pixel withing 'r' pixels of the destination pixel. As the output pixels are traversed, it adds the leading edge to the histogram and subtracts the trailing edge. It handles all the edge cases well, and is quite fast. It's the basis for the Median, Unfocus, Outline, and Remove Noise effects.

Adapting this to support Hue instead of RGB intensity would be rather trivial.

The performance is quite good, and for your purposes it operates in O(r^2+w*r+n*w), where r is the radius, w is the width of the image, and n is the number of levels in the histogram.

-tjackson

ddrcoder 2009-04-03 17:30:03

ansaurus

tags:

views:

answers:

How to find the most common int in a 2d array of ints?

related questions