views:

122

answers:

4

What's the best, easiest, free way to check in Java if a piece of text is spam?

+6  A: 

It's not easy at all and requires to have some theoretical / mathematical / statistics background. It's called Bayesian filtering, it's just one of the methods but works great.

You can have an introduction and some background on wikipedia here, but it is a topic greatly covered over the internet, just search around (here on StackOverflow too I think).

Jack
+1  A: 

Hi, have a look at this: http://www.shiffman.net/teaching/a2z/bayesian/ -- it shows you how to create a spam filter using Bayesian methods in Java :)

Chris Dennett
+4  A: 

Probably the easiest way is to leverage an existing API for that. Akismet has bindings for Java, and it's what Wordpress uses on its blogs by default. Oh, and it's free, libre, open source software.

Cesar
+3  A: 

You could pipe it through SpamAssassin and see what the return value is.

Here's a wacky idea: send the text as an email to a Gmail account. Then use IMAP to see whether it ended up in the Inbox or the Spam folder.

Barry Brown
+1 because it's a very interesting possible use case for GMail.
Cesar
Before using Gmail as a spam filter for an application, it would be prudent to **carefully** read the "terms of service" for Gmail.
Stephen C
+1 SpamAssassin looks promising... not java, but i may be able to make some use out of it
Doug