views:

162

answers:

3

Hi,

I'm always surprised by the high quality of Gmail spam filter. For the last year, it filtered 99.95% of the spam, and blocked by mistake only one mail. By comparison, any other mail service I used makes at least one mistake for every 50 mails.

How, internally, Gmail does to reach this level of quality? Is it based on customers feedback (ie. if N customers block mail as spam, it is sorted as spam for every other customer)? Or there is some trick? Maybe a basic filter algorithm filters the most obvious spam, and some difficult cases are analyzed by real humans?

+6  A: 

This is the million dollar question, and if it were able to be answered on stackOverflow, then everyones spam filter would be as effective.

Fosco
It's not so obvious. Like I said, maybe Google hire humans to filter difficult cases, or the filter is based on users feedback. In this case, yes, everyone who may hire people to do this stuff or rely on a community as large would be able to make an effective spam filter.
MainMa
+1  A: 

I don't really know how exactly Google does SPAM filtering (but I think it's a business secret after all). If you are interested in how SPAM filtering works, I would recommend looking at Bayesian SPAM filtering (http://en.wikipedia.org/wiki/Bayesian_spam_filtering). It's a rather easy to understand method.

WebMonster
+2  A: 

Briefly speaking this is based on the community feedback. Here is a citation from official explanation:

Gmail users play an important role in keeping spammy messages out of millions of inboxes. When the Gmail community votes with their clicks to report a particular email as spam, our system quickly learns to start blocking similar messages. The more spam the community marks, the smarter our system becomes.

You can read a bit more about it on their Spam Explained page.

Artur Zielazny