views:

1029

answers:

7

I was wondering if there is any good and clean oo implementation of bayesian filtering for spam and text classification? For learning purposes.

+2  A: 

maybe https://ci-bayes.dev.java.net/ or what about http://www.cs.cmu.edu/~javabayes/Home/node2.html?

I never played with either.

svrist
+1  A: 

Here is an implementation of Bayesian filtering in C#: A Naive Bayesian Spam Filter for C# (hosted on CodeProject).

Yaakov Ellis
A: 

In French, but you should be able to find the download link :) PHP Naive Bayesian Filter

Vincent Robert
+3  A: 

Check out Chapter 6 of Programming Collective Intelligence

binil
+7  A: 

I definitely recommend Weka which is an Open Source Data Mining Software written in Java:

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

As mentioned above, it ships with a bunch of different classifiers like SVM, Winnow, C4.5, Naive Bayes (of course) and many more (see the API doc). Note that a lot of classifiers are known to have much better perfomance than Naive Bayes in the field of spam detection or text classification.

Furthermore Weka brings you a very powerful GUI

bene
A: 

Don't waste your time on SPAM filtering usages. Spammers easily bypass Bayesian filtering by adding random text to their spam emails.

FIX:
OK,OK, Bayesian filtering can be useful when trained personally. However, at a corporate level or above, its probably useless.

Tal
Random text won't (significantly) affect a Bayesian classification. The remarkable accuracy of filters like SpamBayes and PopFile should demonstrate this.
Chris Wuestefeld
OK, Bayesian classification can be useful for end user usage,to classify "good" emails.I will fix my answer.
Tal
+1  A: 

nBayes - another C# implementation hosted on CodePlex

Joel Martinez