views:

113

answers:

1

I know how to write a similarity function for data points in euclidean space (by taking the negative min sqaured error.) Now if I want to check my clustering algorithms on images how can I write a similarity function for data points in images? Do I base it on their RGB values or what? and how?

A: 

I think we need to clarify better some points:

  1. Are you clustering only on color? So, take RGB values for pixels and apply your metric function (minimize sum of sq. error, or just calculate SAD - Sum of Absolute Differences).
  2. Are you clustering on space basis (in an image)? In this case, you should take care of position, as you specified for euclidean space, just considering the image as your samples' domain. It's a 2D space anyway... 3D if you consider color information too (see next).
  3. Are you looking for 3D information from image? (2D position + 1D color) It's the most probable case. Consider segmentation techniques if your image shows regular or well defined shapes, as first approach. If it fails, or you wanted a less hand tuned algorithm, consider reducing the 3D space of information to 2D or even 1D by doing PCA on data. By analyzing Principal Components you could drop off unuseful information from your collection and/or exploiting intrinsic data structure in some way.

The argument would need much more than a post to be solved, but I hope this could help a bit.

ZZambia