views:

98

answers:

1

A recent announcement by Google about the Google Prediction API sounded very interesting. It could be useful for a project that is coming up, and would probably do a better job than some custom code I was considering.

However, there is some vendor lock-in. Google retain the trained model, and could later choose to overcharge me for it. It occurred to me that there are probably open-source equivalents, if I was willing to host the training myself (I am) and live without their ability to throw hardware at the problem at a moment's notice.

Last time I looked at 3rd Party computer training code was many years ago, and there were a lot of details that needed to be carefully considered and customised for your project. Google appear to have hidden those decisions, and take care of them for you. To me, this is still indistinguishable from magic, but I would like to hear whether others can do the same.

So my question is:

What alternatives to Google Prediction API exist which:

  • categorise data with supervised machine learning,
  • can be easily configured (or don't need configuration) for different kinds and scales of data-sets?
  • are open-source and self-hosted (or at the very least, provide you with a royalty free use of your model, without a dependence on a third party)
+3  A: 

Maybe Apache Mahout?

tszming
Looks like a good solution. Still early days (by their own admission, v0.3). They are tackling a broader range of problems than just classification.
Oddthinking
Yes, but it looks promising as they are targeted for large scale data processing.
tszming