ansaurus

Question

OpenCV 1.1 K-Means Clustering in High Dimensional Spaces

Answer 1

A:

You might like to check out http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/ for another open source clustering package.

Using memcpy like this seems suspect, because when you do:

 int rdsize = chunks.size()*CELL_SIZE*CELL_SIZE;

If CELL_SIZE and chunks.size() are very large you are creating something large in rdsize. If this is bigger than the largest storable integer you may have a problem.

Are you wanting to change "chunks" in this function? I'm guessing that you don't as this is a K-means problem.

So try passing by reference to const here. (And generally speaking this is what you will want to be doing)

so instead of:

std::vector<int> DoKMeans(std::vector<IplImage *>& chunks)

it would be:

std::vector<int> DoKMeans(const std::vector<IplImage *>& chunks)

Also in this case it is better to use static_cast than the old c style casts. (for example static_cast(variable) as opposed to (float)variable ).

Also you may want to delete "rawdata":

 float * rawdata = new float[rdsize];

can be deleted with:

delete[] rawdata;

otherwise you may be leaking memory here.

shuttle87 2010-06-27 17:20:19

Shuttle87,Good catch on the integer overflow on rdsize. That is not a problem now but it will be later. As for the other little errors (i.e. not cleaning up, const correctness) I am just trying to see if this works or not, I will certainly re-write/cleanup everything when I am done. I will look into this other clustering package you pointed out.

kscottz 2010-06-27 17:48:18

It just occurred to me that if the storage of the data sizes is a problem then it might also be a problem within the internals of the library you are using. Perhaps there is some value in posting this on an OpenCV place as their libraries may not be able to support the size of data you are using.

shuttle87 2010-06-28 08:33:16

I have only see cvKMeans do up to three dimensions and the way you need to pack the data is really wonky. I would assume the input matrix would be a linear representation of something that is sXd, where s is the number of samples and d is the dimension of the point . Anyway I have tried about ten different approaches and nothing has worked, so I am using a GPL KMeans algorithm I found on the web and it appears to work great. I would like to find something with a more open license. I have been using openCV for awhile but I not aware of an open forum for questions.

kscottz 2010-06-28 15:37:32

Answer 2

A:

Though I'm not familiar with "bag of features", have you considered using feature points like corner detectors and SIFT?

rwong 2010-06-30 09:32:38

ansaurus

tags:

views:

answers:

OpenCV 1.1 K-Means Clustering in High Dimensional Spaces

related questions