views:

214

answers:

3

Yes, I realize this question was asked and answered, but I have specific questions about this that I feel were not clear on that thread and I'd prefer not to get lost in the shuffle on another thread as well.

Previous threads said that rendering the email address to an image the way Facebook does is overkill and unprofessional user experience for business/professional websites. And it seems that the general consensus is to use a JavaScript document.write solution using html entities or some other method that breaks up and/or makes the string unreadable by a simple bot. The application I'm building doesn't even need the "mailto:" functionality, I just need to display the email address. Also, this is a business web application, so it needs to look/act as professional as possible. Here are my questions:

  1. If I go the document.write route and pass the html entity version of each character, are there no web crawlers sophisticated enough to execute the javascript and pull the rendered text anyway? Or is this considered best practice and completely (or almost completely) spammer proof?

  2. What's so unprofessional about the image solution? If Facebook is one of the highest trafficked applications in the world and not at all run by amateurs, why is their method completely dismissed in the other thread about this subject?

  3. If your answer (as in the other thread) is to not bother myself with this issue and let the users' spam filters do all the work, please explain why you feel this way. We are displaying our users' email addresses that they have given us, and I feel responsible to protect them as much as I can. If you feel this is unnecessary, please explain why.

Thanks.

+3  A: 

Quite a few reasons Javascript is a good solution for now (that may change as the landscape evolves).

  • Javascript obfuscation is a better mouse trap for now
  • You just need to outrun the others. As long as there are low hanging fruit, spammers will go for those. So unless everyone starts moving to javascript, you're okay for now at least
  • most spammers use http based scripts which GET and parse using regex. using a javascript engine to parse is certainly possible but will slow things down

Regarding the facebook solution, I don't consider it unprofessional but I can clearly see why purists may disagree.

  • It breaks accessibility standards (cannot be parsed by browsers, voice readers or be clicked.
  • It breaks semantic construct (it's an image, not a mailto link anymore)
  • It breaks the presentational layer. If you increase browser default font size or use high contrast custom CSS, it won't apply to the email.
aleemb
+2  A: 
  1. It is not spammer proof. If someone looks at the code for your site and determines the pattern that you are using for your email addresses, then specific code can be written to try and decipher that.

  2. I don't know that I would say it is unprofessional, but it prevents copy-and-paste functionality, which is quite a big deal. With images, you simply don't get that functionality. What if you want to copy a relatively complex email address to your address book in Outlook? You have to resort to typing it out which is prone to error.

  3. Moving the responsibility to the users spam filters is really a poor response. While I believe that users should be diligent in guarding against spam, that doesn't absolve the person publishing the address from responsibility.

To that end, trying to do this in an absolutely secure manner is nearly impossible. The only way to do that is to have a shared secret which the code uses to decipher the encoded email address. The problem with this is that because the javascript is interpreted on the client side, there isn't anything that you can keep a secret from scrapers.

Encoders for email addresses nowadays generally work because most email bot harvesters aren't going to concern themselves with coding specifically for every site. They are going to try and have a minimal algorithm which will get maximum results (the payoff isn't worth it otherwise). Because of this, simple encoders will defeat most bots. But if someone REALLY wants to get at the emails on your site, then they can and probably easily as well, since the code that writes the addresses is publically available.

Taking all this into consideration, it makes sense that Facebook went the image route. Because they can alter the image to make OCR all but impossible, they can virtually guarantee that email addresses won't be harvested. Given that they are probably one of the largest email address repositories in the world, it could be argued that they carry a heavier burden than any of us, and while inconvenient, are forced down that route to ensure security and privacy for their vast user base.

casperOne
Great points. This site is nothing like FB, so I don't need to be as secure. I agree about the OCR, they probably alter the img slightly each time which would make OCR nearly impossible. I'll probably do a JS solution that uses both entities and an autogenerated cipher unique per each request.
Rich
+1  A: 

Here is a nice blog post comparing a few methods, with benchmarks.

http://techblog.tilllate.com/2008/07/20/ten-methods-to-obfuscate-e-mail-addresses-compared/

felideon