views:

347

answers:

2

Do you know of any python mapreduce ready clustering libraries?

I have found some good libraries in Java (http://lucene.apache.org/mahout/), I'd prefer to use python though.

http://wiki.github.com/klbostee/dumbo/ (Python mapreduce API )

Edit --- I'm looking for mapreduce ready : Canopy, K-means, Means-shift,etc..

+3  A: 

You can use Python in combination with Hadoop, if you like:

http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python

Amber
Yeah I've seen that, I'm looking for clustering libraries though.http://atbrox.com/2010/02/08/parallel-machine-learning-for-hadoopmapreduce-a-python-example/I'm looking for: Canopy, K-means, Means-shift, Dirchlet etc...
A: 

It appears I can use NLTK modules in hadoop via dumbo... can anyone confirm this can be done?