views:

408

answers:

5

As per this question I asked previously on Google App Engine, if I have access to all the information in a standard email, not just the From, To, Subject, Body fields, but also all the headers and MIME information, how can I verify that two incoming emails with the same From address are actually from the same sender.

What I've considered thus far:

  • Check the IP address of the email's sending server
  • Check the DNS records of the email's sending server
  • Verify the sending agent of the email (i.e. web interface, Outlook, Thunderbird, etc)
  • Check the reply-to field
  • Etc.

I realize this is a complicated question (I'm sure companies like Posterous have spent tons of time on this problem). I'm just looking for a few criteria to get started preliminarily. Thanks!

Update:

The answers so far are really helping, but just to help them out, the context of my project is that I would be receiving tons and tons of email as a web app from my users. They would use their email as the primary way of inputting data into my system. This I why I made the Posterous analogy. The use case is very similar.

+3  A: 

You're right that all of the headers together, and 'known good' email to compare to can help identify likely spoofed emails.

What you're developing would probably be at best a heuristic rather than an algorithm.

I'd consider weighting the fields by time-of-day and how close to 'known good' emails' time-of-day ...

Also, if the 'known good' emails are structured differently than the suspect; i.e. Inline images, html, shortened url's, etc.

John Weldon
+1  A: 

Why not run the emails through spamassassin or some such filter that will attach a bayes score. You can then just read that score. It will save you reinventing the wheel.

You could bayes score the email against a database of all previous emails from the individual.

There is also looking up the Sender Permitted Framework and DomainKeys, which SpamAssassin can do for you.

Phil Hannent
+2  A: 

Just to compliment my brothers posting earlier:

Not knowing the context under which you want to analyse this, and being very general I would suggest your first port of call is SPF or DomainKeys in order to limit the possibility of email coming from a rogue source being accepted. I would also recommend using only one SMTP server with SSL security. I do this and travelling worldwide I have rarely been in a situation I couldn't send mail and in those cases the only thing that did work was webmail (no safe local SMTP).

Additionally to that: if you are verifying mail is really coming from yourself then you could also use PGP tools to sign your mail upon sending and then filter any mail that didn't have a valid signature. Enigmail in Thunderbird is a good source of automatic signing and there are plugins for Outlook as well.

After that if you really want to do a more forensic job on an email then you could use a Spam Bayes to score the email against a database of previous emails. You would build up a database of tokens around the non-unique data (excluding entries such as "To:") and then score the email for the probability that it is like the previous emails. In theory you should score very highly for any mail.

Obviously I don't know your situation, but I think that there are many techniques but sometimes it is easier to go to the root of the issue than try and fix it down the line.

Update

Based on the context supplied:

I would consider using "Address Extensions" this is where your user can send mail to an address which contains a reference using the email address: [email protected] GMail and many other servers support delivery of email with a +extension@ through to the correct [email protected] without hi-jinx. You could get the user to deliver mail with a unique ID as the extension and that way you would know it had come from them and they would feel more special. Obviously someone could steal their unique code by sniffing their outgoing or your incoming mail but that is always possible and if someone can do that they can probably inject mail as well.

If you really just want to go down the analysis route then I would suggest just using the reverse of a SpamAssassin per-user Bayes match. Where you compare every mail to a database of mails from a sender (instead of the traditional matching of mails 'to' an account). Remembering that once your database is polluted with a false positive you will have to remove the false positive or risk the integrity of the matching for that sender.

Bob Hannent
A: 

Probably not practical but something that would work:

When an incoming mail arrives, have a "reply to sender" function and simply ask if they sent it. This could be in the form of a confirmation link that is automatically generated or something.

But since I don't know the specifics of the project this may not be practical... like if you had to do this multiple times for each user, no one would put up with it.

T Pops
+2  A: 

Maybe look into using Sender Policy Framework. It might not be exactly what you are looking for but it might help.

Briefly, the design intent of the SPF record is to allow a receiving MTA (Message Transfer Agent) to interrogate the Name Server of the domain which appears in the email (the sender) and determine if the originating IP of the mail (the source) is authorized to send mail for the sender's domain.

Ripped from wikipedia:

Sender Policy Framework (SPF), as defined in RFC 4408, is an e-mail validation system designed to prevent e-mail spam by addressing a common vulnerability, source address spoofing. SPF allows e-mail administrators the ability to specify which Internet hosts are allowed to send e-mail claiming to originate from that domain by creating a specific DNS SPF record in the public DNS record. Mail exchangers then use the DNS record to verify the sender's identity against the list published by the e-mail administrator.

Peter D