views:

496

answers:

9

Many users and forum programs in attempt to make automatic e-mail address harversting harder conseal them via obfuscation - @ is replaced with "at" and . is replaced with "dot", so

 [email protected]

now becomes

team at stackoverflow dot com

I'm not an expert in regular expressions and I'm really curious - does such obfuscation really make automatic harvesting harder? Is it really much harder to automatically identify such obfuscated addresses?

+7  A: 

When I see this type of obfuscation I also immediately think of regular expressions. It's a piece of cake to harvest emails "obfuscated" in this manner.

I once came with an idea to publish my email address in this way:

You can mail me here:

string myEmail = "";
myEmail = myEmail
          .Append ("myname")
          .Append ("@")
          .Append ("domain")
          .Append (".")
          .Append ("com");

Whoever does not make it out, has failed my basic intelligence test.

Developer Art
It's easy, but requires more effort, and also requires a harvester to be tailor made to each site (or to have a list of regular expressions, but that list could become very long, given the number of variations on a theme I've seen).
Matthew Scharley
@Matthew Scharley: Not long really. Just a bunch of typically used separator sequences themselves separated by white space. Even a dozen of combinations is bound to yield a lot.
Developer Art
+4  A: 

It will be difficult for the spammers as well as your users to identify the email address.

A nice article from wikipedia on Email obfuscation or address munging

One common way of hiding email from bots and spammers is to create an image containing the email address. Facebook does this, for instance. Now, using images for email is inherently bad for accessibility, because text readers will not be able to read it. But even otherwise, there are several free character recognition programs that do a pretty good of decoding such email-images.

From here

rahul
Javascript rewrites can go a long way to fixing this for 98% of your userbase who don't have javascript disabled.
Matthew Scharley
yeah but scrapers won't have JS enabled, so the obfuscation has to be part of the rendered page sent from the server.
Jeff Atwood
+1  A: 

It does make it harder but there are so many really smart scrapers that it probably doesn't help a lot, since the big spammers are using the high quality spam tools.

Mark Harrison
Or worst case - paying someone like 10 cents an hour to manually copy them off of websites. :)
gnarf
@gnarf yes but compared to the cost of running a bot that is still extremely expensive
Jiaaro
+1  A: 

How to fight spamers? Make email address less recognizable for something without brain (i.e. computer).

Non-English speakers are your friends: if your user base is non-English speaking community, switch to obfuscating using other languages: team_małpa_stackoverlow_kropka_com or team_Affenschwanz_stackoverflow_Punkt_com are perfectly recognizable email addresses for respectively Polish- and German-speaking communities. Some email harvesters know Polish or German, but chance is most of harvesters will understand only English.

If you cannot leave English, than switch to some descriptive phrases- like: “in order to send us message write team in your address field, than put symbol AT, than write the name of our site!”.

smok1
+4  A: 

I'm not sure if it really helps with spam - but I've learned to love the Escape Encode Obfuscation for mailto: tags/emails. An example tag:

<a href="%6D%61%69%6C%74%6F%3A%74%65%61%6D%40%73%74%61%63%6B%6F%76%65%72%66%6C%6F%77%2E%63%6F%6D">&#116;&#101;&#97;&#109;&#64;&#115;&#116;&#97;&#99;&#107;&#111;&#118;&#101;&#114;&#102;&#108;&#111;&#119;&#46;&#99;&#111;&#109;</a>

Mails [email protected]

gnarf
+19  A: 

Definitely!

I read this article a while ago which shows how effective (as well as the relative degree) the various methods can be. Reversing an already reversed string seems to be fairly decent protection at the moment.

The following code sample:

<style type="text/css">
   span.codedirection { unicode-bidi:bidi-override; direction: rtl; }
</style>

<p><span class="codedirection">moc.etalllit@7raboofnavlis</span></p>

Will output the email so it's readable at least.

That said, it is almost an arms race. But as long at you're ahead of the curve, it'll be more effort to harvest your address rather than ordinary un-obfuscated ones.

davewasthere
Excellent find!
gnarf
+1 That's a good idea! Thanks.
Nirmal
+6  A: 

Obfuscation techniques falls in the same category than captchas. They are not reliable and tend to hurt regular users more than bots.

Javascript obfuscation seems to be praised, but is no silver bullet : it is not that hard today to automate a browser for email sniffing. If it can be displayed in a browser, it can be harvested. You could even imagine a bot that's taking screenshots of a browser window and using OCR to extract addresses to beat your million-dollar-obfuscation-technique.

Depending on where and why you want to obfuscate emails, those techniques could be useful :

  • Restrict email visibility : you may hide emails on your website/forum to anonymous users, to new users (with little to no activity or posts to date) or even hide them completely and replace email contact between members with a built-in private messaging feature.

  • Use a dedicated spam-filtered email : you will get spammed, but it will be limited to this particular address. This is a good trade-off when you need to expose the email address to any user.

  • Use a contact form : while bots are pretty good at filling forms, it turns out that they are too good at filling forms. Hidden field techniques can filter most of the spam coming through your contact form.

Altherac
+1  A: 

To provide a literal answer, yes, harvesting obfuscated addresses is harder than harvesting standardized addresses. The real question is whether the extra effort will be put in by harvesters and if the (major? minor?) barrier to the harvesters is worth the possible problems for your users.

If you are going to scramble addresses or otherwise transpose them away from the standard form, you should avoid being consistent in how you do so – at least on the same site.

For example, if every email address on a large community site is reversed in the markup and rendered properly with CSS, or token-replaced (@ becomes 'at'), or any other predictable method, the harvesters will just write a thin adapter for your site.

Think of it this way: if it only takes you one line of code to "scramble" them sitewide, it will only take the harvester one line of code to "unscramble" them for your site. Roughly speaking.

In my opinion, spam has become such a problem and so many DBs have been turned over that we're beyond hiding our addresses. Instead, consider looking at Defensio and Akismet, etc, to help classify and block spam.

Toby Joe Boudreaux
+2  A: 

It's analagous to putting a "protected by ADT" sticker on your front door.

Will that prevent a talented burglar from entering your house? Of course not.

Will it make the house next door with an unlocked door and an iPod in the window a more compelling target? Pretty likely.

A simple unobfuscated email scraper is going to get TONS of emails as it is. Maybe a very simple regex to pick up very common obfuscation methods is worth the effort. Past that, you're spending a lot of time trying to decipher an increasingly small percentage of emails.

All that to say, having some clever obfuscation is probably worth it.

For the record, my email has been on my public resume in plain text for years now, because I use gmail, which has a spam filter that works.

Triptych