views:

815

answers:

13

Is there a recommended package for machine learning in Python?

I have previous experience in implementing a variety of machine learning and statistical algorithms in C++ and MATLAB, but having done some work in Python I'm curious about the available packages for Python.

+2  A: 

You might want to look at:

http://www.shogun-toolbox.org/, which has interfaces for multiple languages, including python. There's also http://www.pybrain.org/, which is (I believe) a native implementation of ML algorithms. Hope that helps.

oort
+1  A: 

I'm not sure you'd exactly call this machine learning, but the nltk package does Bayesian-style classification of text. You can use learning data and test data to see that it is inferring rules about the data.

hughdbrown
NLTK is more useful for text mining.
Selinap
+1  A: 

Maybe take a look at the "Modular toolkit for Data Processing" (MDP). It implements a couple of algorithms from machine learning and statistics and it is well documented.

nikow
+3  A: 

A general user friendly package is Orange -- kind of like Weka or RapidMiner, if you're familiar with those.

Other than that, there's a variety of packages and toolkits for various tasks. You should consult the Python packages listed on mloss as a starting point.

ars
+3  A: 

AFAIK, Orange may be the best choice at the moment.
PyML is good too.
PyMC for Bayesian estimation.
and, there is a Book "Machine Learning: An Algorithmic Perspective", There are lots of Python code examples in the book, maybe it is worth reading.
and there is a blog post: Pragmatic Classification with Python.
Just my two cents.

sunqiang
+3  A: 

For Support Vector Machines, take a look at LibSVM which among others, have Python interface.

piobyz
+1  A: 

This is a great list done by SciPy, of many well-known Python packages, among others, machine learning related: Artificial intelligence & machine learning

piobyz
A: 

I gave Orange a try.

It's powerful, but if you go through the documentation, you would realize that the author has his own crazy style of writing Python. His code does get pretty cryptic if you are relatively new to Python so I wouldn't recommend Orange unless you are familiar with Python.

4cents
+1  A: 

Probably related questions at Stack Overflow:

Artificial Inteligence library in python.

What is the best artificial-intelligence library for Python?

jetxee
A: 

If you are looking for neural network, python binding for fann is quite easy to use,and come with tools to train your networks

chub
+1  A: 

http://www.pymvpa.org might work as well.

Mike
+2  A: 

Deep Learning Tutorials describe how to develop and train deep neural networks. The used library even use Nvidia GPU if available.

Ivo Danihelka
+2  A: 

There is also scikit-learn (BSD, with only dependencies on numpy & scipy). It includes various supervised learning algorithms such as:

  • SVM based on libsvm and linear with scipy.sparse bindings for wide features datasets
  • bayesian methods
  • HMMs
  • L1 and L1+L2 regularized regression methods aka Lasso and Elastic Net models implemented with algorithms such as LARS and coordinate descent

It also features unsupervised clustering algorithms such as:

  • kmeans++
  • meanshift
  • affinity propagation

And also other tools such as:

  • feature extractors for text content (token and char ngrams + hashing vectorizer)
  • univariate feature selections
  • a simple pipe line tool
  • numerous implementations of cross validation strategies
  • performance metrics evaluation and ploting (ROC curve, AUC, confusion matrix, ...)
  • a grid search utility to perform hyper-parameters tuning using parallel cross validation
  • integration with joblib for caching partial results when working in interactive environment (e.g. using ipython)

Each algorithm implementation comes with sample programs demonstrating it's usage either on toy data or real life datasets.

Also, the official source repo is hosted on github so please feel free to contribute bugfixes and improvement using the regular pull request feature for interactive code review.

ogrisel