spam-prevention

When the bots attack!

What are some popular spam prevention methods besides CAPTCHA? ...

Prevent site data from being crawled and ripped

I'm looking into building a content site with possibly thousands of different entries, accessible by index and by search. What are the measures I can take to prevent malicious crawlers from ripping off all the data from my site? I'm less worried about SEO, although I wouldn't want to block legitimate crawlers all together. For example,...

What can be done to prevent spam in forum-like apps?

Are there ways except CAPTCHAs for web apps like pastie.org or p.ramaze.net? CAPTCHAs take too long for a small paste for my taste. ...

What is the best method to keep bots from spamming your blog?

Hey all! I got a problem at my blog. I got visits from kind bots who leave "nice" comments to my blog posts :( I'm wondering if there is a smarter way to keep them out, besides using the captcha modules. My problem with the captcha modules is that I thinks they are anoying to the user :( I don't know if it's any help to anyone but m...

Best language choice for a spam detection service

I have around 20 or so active blogs that get quite a bit of spam. As I hate CAPCHA the alternative is very smart spam filtering. I want to build a simple REST api like spam checking service which I would use in all my blogs. That way I can consolidate IP blocks and offload spam detection to 3rd party such as Akisment, Mollom, Defensio ...

How can I find the Largest Common Substring between two strings in PHP?

Is there a fast algorithm for finding the Largest Common Substring in two strings or is it an NPComplete problem? In PHP, I can find a needle in a haystack: <?php if (strstr("there is a needle in a haystack", "needle")) { echo "found<br>\n"; } ?> I guess I could do this in a loop over one of the strings but that would be very ex...

How can I use PHP to obfuscate email addresses so they are not easily harvested by spammers?

I'm programming in PHP and would like to create web pages which have email addresses that are easily read by humans but not easily harvested by spammers. The email addresses are coming from user input, and I think I can identify an address by using a regular expression, but I'm not clear exactly how I should replace the email addresses ...

What is the best way to do basic View tracking on a web page?

I have a web facing, anonymously accessible, blog directory and blogs and I would like to track the number of views each of the blog posts receives. I want to keep this as simple as possible, accuracy need only be an approximation. This is not for analytics (we have Google for that) and I dont want to do any log analysis to pull out th...

Are there any alternatives to recaptcha.net, for stopping spam?

A member of my company in greater ranking than myself refuses to use recaptcha.net on his website to thwart spam off of a public form. He thinks it would be difficult for anyone coming to our site to enter their information since the Turing Tests are "so darn hard to read". Is there an alternative to using this method? That doesn't co...

Blacklists, Whitelists, Spam Folders and Email

I want to get on the whitelists for my email system, any recommendations whom to should contact about doing this, do I contact the big email providers directly (Yahoo, gmail, Microsoft hotmail/msn, AOL)? Also besides domainkeys, dkim and SPF records what else is a good way to protect yourself from getting on blacklists and going into sp...

Upgrading Mailman with PGP authentication to prevent spam

We had recently a spam attack to our mailing list which overcome all Mailman anti-spam measures, by impersonating the spam as sent from members of the list. To fight such spam, a better authentication is needed and PGP signed emails seem to be one of the solutions. My question is, how to upgrade the Mailman to deal with PGP signed email...

What's the best open-source Java Bayesian spam filter library?

In other answers at Stackoverflow it's been suggested that Weka is good, but there are others (Classifier4j, jBNC, Naiban). Does anyone have actual experience with these? ...

How do I protect my forum against spam?

I have a forum on a website I master, which gets a daily dose of pron spam. Currently I delete the spam and block the IP. But this does not work very well. The list of blocked IP's is growing quickly, but so is the number of spam posts in the forum. The forum is entirely my own code. It is built in PHP and MySQL. What are some concrete...

Best way to send email from my web app so it looks like it came from my users account

I'm working on a web application. A user will create an email message that will be sent to another person. I would like the e-mail that gets sent to appear from the user's name and e-mail address of the user on my system. And if they reply to the e-mail then it should go directly to the sender's email address. However I am worried abou...

Bayesian spam filtering library for Python

I am looking for a Python library which does Bayesian Spam Filtering. I looked at SpamBayes and OpenBayes, but both seem to be unmaintained (I might be wrong). Can anyone suggest a good Python (or Clojure, Common Lisp, even Ruby) library which implements Bayesian Spam Filtering? Thanks in advance. Clarification: I am actually looking ...

What's the most straightforward way to delete emails marked as spam by SpamAssassin?

I'm on Ubuntu Intrepid, using Postfix and SpamAssassin. I've seen approaches using procmail (like the one suggested @ Apache), but I'm looking for a solution that does not use procmail. This is a programming question because the correct answer will be some form of code that accomplishes the task at hand (my response to the negative vote...

OpenID a lucrative target for spammers?

Due to the nature of OpenID, wouldn't it be a lucrative target for spammers? For starters, you could create an OpenID account on any site and use it on any other site which would mean that I could log into a forum and write a few thousand posts if the forum assumes that logged in users can be trusted. Do you agree OpenID is lucrative fo...

Good non-intrusive anti-spam email obfuscator?

I'm trying to come up with a JavaScript email obfuscator to reduce the chance for spam in emails listed on a web site. Right now I've got a JavaScript based obfuscator that uses a combination of HTML encoding & JavaScript to convert an obfuscated email into a normal email transparently. What I do is this: Format the "mailto:" part of ...

Detecting a (naughty or nice) URL or link in a text string

How can I detect (with regular expressions or heuristics) a web site link in a string of text such as a comment? The purpose is to prevent spam. HTML is stripped so I need to detect invitations to copy-and-paste. It should not be economical for a spammer to post links because most users could not successfully get to the page. I would...

What is the best way to programatically detect porn images?

Akismet does an amazing job at detecting spam comments. But comments are not the only form of spam these days. What if I wanted something like akismet to automatically detect porn images on a social networking site which allows users to upload their pics, avatars, etc? There are already a few image based search engines as well as face r...