views:

58

answers:

3

How do you prevent emails being gathered from HTML pages by email spiders? Does mailto: linking them increase the likelihood of them being picked up? I think I read somewhere about a built-in Ruby function that confuses email spiders by decimal-encoding the email address - can anyone link me to some documentation or tell me how effective this is?

Note: People must log in before they can view the email addresses, I just want to know about extra security measures to stop a registered user from harvesting emails (as there is a rather large userbase).

+1  A: 

I tend to avoid a mailto as it makes it too easy for people to harvest email addresses.

If you are going to have contact pages on your website, then just have a form, and when they submit your server-side code and use the appropriate email address.

If you need to be able to have other people's addresses available, use numbers, names, usernames, some way to identify them.

If you just have an email address in a span it is likely to be picked up, even if you try to hide it, as these programs can be pretty complex, as finding email addresses is what they are about.

As with most secrets, if you don't want others to get them, don't put them on the page.

James Black
The emails are on the page, in `<span>`s. Other than removing the emails addresses, is there anything I can do to decrease the probability of them being picked up by email spiders?
Josh
+3  A: 

Most email spiders don't have javascript interpreters, so if you really need the mailto: you can inject it with javascript... just make sure the address is obscured in the javascript somehow, e.g.

myLink.href='mai'+'lto:'+'bob'
           +'@'
           +'example.com';

If you need to display the email address on the page, a common solution is to generate an image using something like php's gd (although the javascript injection should work ok for this too).

The idea is to remove the email addresses from the HTML and inject them with javascript. That way the email address doesn't appear in its original form in any of the HTTP traffic, which is what the spider is looking at.

no
Thanks, that's quite a clever way of doing it.
Josh
Then you have to require javascript to use this page, which can reduce your users.
James Black
+1  A: 

I usually split them up into separate parts and then re-combine them using javascript. The final javascript does a document.write to write out the html.

i.e.

var mail = "mailto";
var namepart = "test.user";
var domainpart = "example";
var tld = "com"; 
var address = namepart + "@" + domainpart + "." + tld;
document.write("<a href=" + mail + ":" + address + '">' + address + "</a>";
Scott