views:

273

answers:

2

Dear Everyone I am interested to perform kmeans clustering on a list of words with the distance measure being Leveshtein.

1) I know there are a lot of frameworks out there, including scipy and orange that has a kmeans implementation. However they all require some sort of vector as the data which doesn't really fit me.

2) I need a good clustering implementation. I looked at python-clustering and realize that it doesn't a) return the sum of all the distance to each centroid, and b) it doesn't have any sort of iteration limit or cut off which ensures the quality of the clustering. python-clustering and the clustering algorithm on daniweb doesn't really work for me.

Can someone find me a good lib? Google hasn't been my friend

A: 

Not really an answer to your specific question, but I recommend glancing at "Programming Collective Intelligence". At the end of each chapter, e.g., clustering, it wanders off into describing all the best reading on the subject.

Charles Merriam
A: 

Yeah I think there isn't a good implementation to what I need.

I have some crazy requirements, like distance caching etc.

So i think i will just write my own lib and release it as GPLv3 soon.

sadawd