views:

65

answers:

1

The Problem:

I have a large double (2d) array populated with various labels. Each element (cell) in the double array contains a set of labels and some elements in the double array may be empty. I need an algorithm to cluster elements in the double array into discrete segments. A segment is defined as a set of pixels that are adjacent within the double array and one label that all those pixels in the segment have in common. (Diagonal adjacency doesn't count and I'm not clustering empty cells).

|-------|-------|-------|
| Jane  | Joe   |       |
| Jack  | Jane  |       |
|-------|-------|-------|
| Jane  | Jane  |       |
|       | Joe   |       |
|-------|-------|-------|
|       | Jack  | Jane  |
|       | Joe   |       |
|-------|-------|-------|

In the above arrangement of labels distributed over nine elements, the largest cluster is the “Jane” cluster occupying the four upper left cells.

What I've Considered:

I've considered iterating through every label of every cell in the double array and testing to see if the cell-label combination under inspection can be associated with a preexisting segment. If the element under inspection cannot be associated with a preexisting segment it becomes the first member of a new segment. If the label/cell combination can be associated with a preexisting segment it associates.

Of course, to make this method reasonable I'd have to implement an elaborate hashing system. I'd have to keep track of all the cell-label combinations that stand adjacent to preexisting segments and are in the path of the incrementing indices that are iterating through the double array. This hash method would avoid having to iterate through every pixel in every preexisting segment to find an adjacency.

Why I Don't Like it:

As is, the above algorithm doesn't take into consideration the case where an element in the double array can be associated with two unique segments, one in the horizontal direction and one in the vertical direction. To handle these cases properly, I would need to implement a test for this specific case and then implement a method that will both associate the element under inspection with a segment and then concatenate the two adjacent identical segments.

On the whole, this method and the intricate hashing system that it would require feels very inelegant. Additionally, I really only care about finding the large segments in the double array and I'm much more concerned with the speed of this algorithm than with the accuracy of the segmentation, so I'm looking for a better way. I assume there is some stochastic method for doing this that I haven't thought of.

Any suggestions?

Edit:

My desired output is a list of segments, each segment being a label and a list of points. Therefore, in the example above I'd want two segments to be returned:

Segment 1 - Jane: (1,3), (2,3), (1,2), (2,2)
Segment 2 - Joe: (2,3), (2,2), (2,1)
+2  A: 

You basically want to implement a flood fill algorithm--consider the array as a set of images, one per distinct label, where the label is a color, and the lack of a label is black; you then want to segment it into all connected components of that color.

Repeat for all labels and you're done.

If your labels are sparse, you're probably better off not actually creating an image for each label and using an existing flood fill routine. In that case, do your own breadth-first flood fill by creating a copy of the array and building connected blocks one label at a time while destroying the existing label.

I am going to call one entry a "pixel" and the whole array an "image".

The algorithm goes, roughly,

for each pixel in the image
  for each label in the pixel
    1. remove the label
    2. mark the current pixel
    3. for each marked pixel, look in every adjacent pixel for the label
    4. remove any labels found
    5. if labels are found, clear marks, and mark the newly label-removed pixels
    6. if anything is marked, go back to 3
    7. report the set of points where you removed labels

Since this is destructive, you don't have to worry about backtracking. (If you can't destroy your original, and can't make a copy, then you have to keep track of what you've done along the way, which is more of a hassle.)

Rex Kerr
Exactly what I intended to post, but much more concise.
Ant