views:

579

answers:

5

This is in continuation with the question posted here: http://stackoverflow.com/questions/408358/finding-the-center-of-mass-on-a-2d-bitmap which talked about finding the center of mass in a boolean matrix, as the example was given.

Suppose now we expand the matrix to this form:

0 1 2 3 4 5 6 7 8 9
1 . X X . . . . . .
2 . X X X . . X . .
3 . . . . . X X X .
4 . . . . . . X . .
5 . X X . . . . . .
6 . X . . . . . . .
7 . X . . . . . . .
8 . . . . X X . . .
9 . . . . X X . . .

As you can see we now have 4 centers of mass, for 4 different clusters.

We already know how to find a center of mass given that only one exists, if we run that algorithm on this matrix we'll get some point in the middle of the matrix which does not help us.

What can be a good, correct and fast algorithm to find these clusters of mass?

+2  A: 

I think I would check each point in the matrix and figure out it's mass based on it's neighbours. The mass for points would fall with say the square of the distance. You could then pick the top four points with a minimum distance from each other.

Here's some Python code I whipped together to try to illustrate the approach for finding out the mass for each point. Some setup using your example matrix:

matrix = [[1.0 if x == "X" else 0.0 for x in y] for y in """.XX......
.XXX..X..
.....XXX.
......X..
.XX......
.X.......
.X.......
....XX...
....XX...""".split("\n")]

HEIGHT = len(matrix)
WIDTH = len(matrix[0])
Y_RADIUS = HEIGHT / 2
X_RADIUS = WIDTH / 2

To calculate the mass for a given point:

def distance(x1, y1, x2, y2):
  'Manhattan distance http://en.wikipedia.org/wiki/Manhattan_distance'
  return abs(y1 - y2) + abs(x1 - x2)

def mass(m, x, y):
  _mass = m[y][x]
  for _y in range(max(0, y - Y_RADIUS), min(HEIGHT, y + Y_RADIUS)):
    for _x in range(max(0, x - X_RADIUS), min(WIDTH, x + X_RADIUS)):
      d = max(1, distance(x, y, _x, _y))
      _mass += m[_y][_x] / (d * d)
  return _mass

Note: I'm using Manhattan distances (a k a Cityblock, a k a Taxicab Geometry) here because I don't think the added accuracy using Euclidian distances is worth the cost of calling sqrt().

Iterating through our matrix and building up a list of tuples like (x, y, mass(x,y)):

point_mass = []
for y in range(0, HEIGHT):
  for x in range(0, WIDTH):
    point_mass.append((x, y, mass(matrix, x, y)))

Sorting the list on the mass for each point:

from operator import itemgetter
point_mass.sort(key=itemgetter(2), reverse=True)

Looking at the top 9 points in that sorted list:

(6, 2, 6.1580555555555554)
(2, 1, 5.4861111111111107)
(1, 1, 4.6736111111111107)
(1, 4, 4.5938888888888885)
(2, 0, 4.54)
(4, 7, 4.4480555555555554)
(1, 5, 4.4480555555555554)
(5, 7, 4.4059637188208614)
(4, 8, 4.3659637188208613)

If we would work from highest to lowest and filter away points that are too close to already seen points we'll get (I'm doing it manually since I've run out of time now to do it in code...):

(6, 2, 6.1580555555555554)
(2, 1, 5.4861111111111107)
(1, 4, 4.5938888888888885)
(4, 7, 4.4480555555555554)

Which is a pretty intuitive result from just looking at your matrix (note that the coordinates are zero based when comparing with your example).

PEZ
+1  A: 
MizardX
+1  A: 

Here's a similar question with a not so fast algorithm, and several other better ways to do it.

krusty.ar
+2  A: 

You need a clustering algorithm, this is easy since you just have a 2 dimensional grid, and the entries are bordering each other. You can just use a floodfill algorithm. Once you have each cluster, you can find the center as in the 2D center of mass article..

martinus