views:

476

answers:

5

Hi

I'm stuck in having to write a simple spam filter I'm not really sure about how I'm going to do it.

So far I've come up with wordlist and domain filtering, which will give or remove points up to a certain threshold.

For example, if you're writing about "v1agr4" from a blacklisted domain, you'll get like 2 points for spam, but if you're writing about "v1agr4" from a hotmail.com account, you'll get only 1 "spam point".

Do you guys have any other suggestions / ressources?

This is more about learning spam filters than developing something enterprise grade

+1  A: 

Some really good algorithm info here:

http://www.paulgraham.com/spam.html

http://www.paulgraham.com/better.html

But, seriously, why reinvent the wheel?

Just download K9: http://keir.net/k9.html

BoltBait
+1  A: 

Look into Bayesian Spam Filtering.

I know perl has a library for it, so I'd assume java would have one too.

Gavin Miller
A: 

I've written one with all the bells and whistles.

porneL
+1  A: 

Some open source Java projects related to Bayesian Spam Filtering (that was mentioned by LFSR Consulting):

And one extra for C++:

Touko
A: 

You can delegate that to a distributed service. Akismet is a very good solution.

Guillaume