I'm creating a website where users can write articles and comment on the articles. I want to automatically check to see if a new article or comment is spam.
What are good libraries for doing this?
I looked at bayesian classifier libraries, but it seems that I would have to gather a large amount of samples and classify them all as spam or not spam myself...
I'm looking for something that can hopefully just tell me right out of the box.
UPDATE: Maybe if something like this doesn't exist, does anyone know of a download of a large amount of classifications of spam vs not spam that can be fed into a bayesian classifier?