ansaurus

Question

Translation of clustering problem to graph theory language

Answer 1

+1 A:

For graph theory solutions there are a couple of sentences in wikipedia, but you are probably best posting on MathOverflow. This question might also be useful.

The traditional method (and probably best considering its ubiquity) in computing to solve these is raster analysis - well known in the world of GIS and Remote Sensing, so there are a number of tools that provide implementations. Keywords to use to find the one most suited to your problem would be raster, nearest neighbor, resampling, and clustering. The GDAL library is often the basis for other tools.

E.g. http://clusterville.org/spatialtools/index.html

You could try checking out the GDAL library and source code to see if you can use it in your situation, or to see how it is implemented.

For checking for circular shapes you could convert remaing values to polygons and check the resultant feature

http://www.gdal.org/gdal_polygonize.html

geographika 2010-04-17 17:32:46

Thanks for the links geographika.Since my signal is very weak I wouldn't like to discard the background though (this opens up another set of problems with fluctuations anyway). And for shape, I don't really care about it, I just want to find blobs.In the end these ideas would still (need to) go along the lines of greedy or seeded algorithms which are known to have issues here. That's why I am looking for different approaches.

honk 2010-04-17 18:32:35

@geographika: Thanks again for the pointers. But as I wrote, I am already aware of these tools and looking for a totally different direction to attack this (mostly to not to get too used to the limitations of thinking in their terms I guess). I guess the question is just poorly phrased or lacks focus.

honk 2010-04-21 05:51:50

Answer 2

A:

I'm not sure I see a graph theory analogy, but you can speed things up by pre-computing an area integral. This feels like a multi-scale thing.

A[i,j] = Sum[Sum[p[u,v], {u, 0, i}, {v, 0, j}]];

Then the average brightness of a rectangular (a,b), (c,d) region is

(A[c,d] - (A[c,b] + A[a,d]) + A[a,b])/((c-a)(d-b))

Overflow is probably not your friend if you have big numbers in your cells.

Justin 2010-05-12 16:12:31

Could you explain how this could be useful for identifing the blob? The problem is really that the "signal" is very close to the "background".

honk 2010-05-13 19:24:46

Instead of computing the average value in a k neighborhood of a pixel (which requires k*k memory accesses and additions) you can use 4 memory access and 4 additions.

Justin 2010-05-13 19:57:43

This type of thing might work well if you started with larger regions of the image and tried to recursively find areas of similar signal/noise ratio, then apply your area averaging filter.

Justin 2010-05-13 20:12:59

Answer 3

A:

Use the union-find algorithm for clustering? It's very fast.

I guess the graph would result from considering each pair of neighboring high-valued cells connected. Use the union-find algorithm to find all clusters, and accept all those above a certain size, perhaps with shape constraints too (eg based on average squared distance from cluster center vs cluster size). It's a trivial variation on the union-find algorithm to collect statistics that you would need for this as you go (count, sum of x, sum of x^2, sum of y, sum of y^2).

Paul Harrison 2010-05-18 00:37:22

Answer 4

A:

If you are just looking for a way to translate your problem into a graph problem heres what you could do.

From each point look at all your neighbors (this could be the 8 adjacent squares or the 4 adjacent depending on what you want). Look for the neighbor with the max value, if it is larger than you then connect yourself to this square. if it is smaller then do nothing.

after this you will have a forest or possibly a tree (though i imagine that that is unlikely). This is one possible translation of your matrix to a graph. Im not sure if it is the most useful translation.

luke 2010-05-18 04:23:40

That sounds like a very interesting approach. Would I create an extra tree to account for fluctuations?

honk 2010-05-18 19:33:08

im not sure what you mean?

luke 2010-05-18 20:09:51

@luke: What you described in your answer clusters cells to the neighbor with the highest count. This count can fluctuate, so the combination might not belong to the same cluster.

honk 2010-05-18 20:28:46

ansaurus

tags:

views:

answers:

Translation of clustering problem to graph theory language

related questions