views:

177

answers:

8

Folks,

I have an html form which sends the form data as an email. Unfortunately spam bots have been filling the form in and send web site links and email addresses in the message part of the form.

Is there a way I can delete the web site links and email addresses when the "Submit" button is pressed before it gets sent on as an email address? I use PHP to do the actually sending of the form data as an email message.

Thanks and regards,

Tony

A: 

How about using regular expressions to replace anything that matches a web link or email address to an empty string.

You can find tones of regular expression examples on the web, just google it

oykuo
+6  A: 

Have you tried having a hidden form via css (display: none) but in HTML like a regular form, and call it email or something common, and if that form has data, then it must be a bot.

Ólafur Waage
clever! (15 chars..)
nickf
A: 

You can do that with Javascript. The problem is that Javascript needs to be enabled for that to happen so you can imagine the odds that the spambots will be cooperative when it comes to this.

You're going to have to filter it out on the server. Personally I would simply reject outright any message containing links or email addresses. Spit back an error to the user but don't even save it.

For some tips on detecting spam in PHP read Spam-free accessible forms.

You may also want to consider the use of CAPTCHA.

cletus
A: 

You would have to iterate over every field in the $_POST array (at least the ones you don't want to have emails or links in) and check it against a couple of regexes.

The suggestion to use CAPTCHA is also a good one.

Anyway, here's a crappy implementation of the checking:

class ValidationHelper
{
 // regex taken from http://code.google.com/p/prado3/source/browse/branches/3.2/framework/Web/UI/WebControls/TEmailAddressValidator.php?spec=svn2583&r=2583
 const EMAIL_REGEX = "#\\w+([-+.]\\w+)*@\\w+([-.]\\w+)*\\.\\w+([-.]\\w+)*#";

 // hacked up regex that I just cooked up - could be hugely improved i'm sure.
 const LINK_REGEX = "#(h\s*t\s*t\s*p\s*s?|f\s*t\s*p)\s*:\s*/\s*/#";

 public static function containsEmail($value)
 {
  if (preg_match(self::EMAIL_REGEX, $value))
   return true;

  return false;
 }

 public static function containsLink($value)
 {
  if (preg_match(self::LINK_REGEX, $value))
   return true;

  return false;
 }
}

$errors = array();
foreach ($_POST as $key=>$value) {
 // presumably you want at least one email field, yeah?
 if ($key != 'email') {
  // perhaps you should be running strip_tags over everything if you don't want html and such...
  // see http://php.net/strip_tags for more info. without it (or something similar), there's nothing 
  // to stop people from putting <script type="text/javascript" src="http://notyourdomain.com/~1337skriptkiddy/haxxors.js"&gt;&lt;/script&gt;
  // into your form. even if you might not necessarily ever be displaying this in a scenario
  // where it can cause trouble, it's never a bad idea to stop this stuff *before* it gets into your db
  $_POST[$key] = $value = strip_tags($value);
  if (ValidationHelper::containsEmail($value) || ValidationHelper::containsLink($value))
   $errors[] = 'Please ensure the value you entered for '.$fieldNames[$key].' does not contain any links or email addresses';
 }
}

if (!empty($errors)) {
 // failed - show errors.
}
else {
 // success!
}
Shabbyrobe
Why iterate? Just concatenate them with a space in between, and run the regex on the result...
Eli
Because I'm prescribing that strip_tags be run on each value to sanitise it before it is checked. Presumably something will need to be done with the data afterwards anyway (like saving it to a db), which means that the sanitisation should be done field by field.I suppose you could concatenate to do the regex and still iterate to do the sanitisation, but that seems like a micro-optimisation to me.
Shabbyrobe
A: 

Well, if your really want to fight spam, go for these steps:

  1. Put a CAPTCHA in the form so that non-humans cannot even submit the form. A very popular CAPTCHA implementation is reCAPTCHA.

  2. Do a strip_tags on the fields so that even if someone puts URLs by hand, it will be removed.

  3. Do a regular expression check for email addresses and remove them as well. Pick a good regex expression from the web which will pick most email formats.

Hope this help. Cheers!

A: 

Presumably, you don't want to accept any kind of HTML, just plain text? In this case strip_tags is your friend. strip_tags also allows you to specify some tags that are acceptable.

I also heartily recommend incorporating a header-injection defence script.

da5id
A: 

Folks,

I tried using strip_tags which certainly removes tags but doesn't remove "mailto:" nor "http://" text, so the links are still links.

Is there an easy PHP command or routine that can scan a string and just replace "mailto:" and/or "http://" with a harmless empty string "" in those portions of the string?

Tried googling too but most of the stuff I found was about trimming white space etc.

Sorry about this, I'm kinda new to PHP.

Thanks and Regards,

Tony

That should work (did not test it). $input = preg_replace(array("/mailto:[^ ]*/i","/http:\/\/[^ ]*/i"),"",$input);
St. John Johnson
A: 

This is very simplistic, but you could build on it I'm sure ...

Try adding a picture of a cat or dog, then asking them to enter the three letter name of the animal shown ... or something similar. Do a validation check, then go from there ... cheap and easy Captcha. This way only human input is going out .. .

Bot would have a 50/50 guess chance on that.
Ólafur Waage
How so ... since when do bots interpret images?