views:

391

answers:

5

I have a text file of e-mails like this:

10:[email protected];[email protected]
12:[email protected]; "George <[email protected]>" 
43:[email protected].;[email protected]
...

I wanted to check if the list contains well formatted entries. Do you know any tool or web-service to check and give me a list of invalid addresses ?

Update Dear all, thank you for your input. I was really looking for a basic synatx check, so i will stay with Rafe's idea (i will do it with Java). Thank you.

+3  A: 

Read this so you are doing it the RFC compliant way:

http://www.eph.co.uk/resources/email-address-length-faq/

Sean A.O. Harney
Meeting the full requirements of the RFC is a bear, so it depends on what the business requirements are. I remember porting the email address validation algorithm from the Majordomo listserv software to another language once. It was a pain.
Rafe
I use a .name address and I wish more people followed the RFC. I had my email address rejected from countless sites. It was even rejected from businesses like Macys.
Elijah
Technically the username portion of the email address is case sensitive is another thing a lot of people forget about email addresses. You would be hard-pressed to find a service that follows this though. I know gmail doesn't at least.
Sean A.O. Harney
+3  A: 

Probably the simplest way to validate an email is to send a message to it. As Sean points out this can leave you open to DoS attacks, but from your description it seems you have a text file rather than a web page, so this shouldn't be a problem.

Regular expressions are not a good tool for matching emails, there are a lot of valid addresses that naive matching will fail. Check out this comparison of attempts to validate emails with regex for details.

If you have to check them offline, I would split the email into parts (i.e. the parts before the @ and after the @), you could then create a custom validator (or regex) to validate those parts.

Rich Seller
That is one way to do it, but your page could be used to automate spam DoS attacks on people's inboxes then. Many servers will accept a message to anyone @ any local domains so the SMTP rcpt to: line is not enough to validate the email address actually exists. You would have to wait for the bounce-back if it ever comes.
Sean A.O. Harney
@Sean that's a good point. My main concern is that people default to using regex to do the validation, and tend to do it naively and exclude a raft of valid email addresses
Rich Seller
+3  A: 

Email validation is not as simple as a regular expression

First, I would read this article I Knew How To Validate An Email Address Until I Read The RFC.

Back in the days of yore, you could just connect to the user's mail server and use the VRFY command and verify that an email address was valid, but spammers abused that privilege and we all lost out.

Now, I would recommend a three part approach:

  1. Verify the syntactic validity. You can use the monster regex from the Mail perl module to check to make sure that the email address is well formed. Then make sure to blacklist localhost domains/ips as part of your check.

  2. Verify that the domain is live. Do a DNS validation check on the domain. You could take this one step further and use a STMP check and make sure that you can connect to a valid mailserver for the domain. However, there may be some false negative results due to virtual hosting schemes.

  3. Send an actual email, but include a single image that links to a script on your server. When the email is read with the image, your server will be notified that the image was download and hence the email is alive and valid. However, nowadays many email clients do not load images by default for this very reason, so it won't be 100% effective.

Resources

  1. Validating Email Addresses in ASP (online)
  2. Validating Email Addresses in PHP (code examples)
  3. This commercial product does bulk email verification ← This is probably what you are looking for
  4. SO Question: How to check if an email address exists without sending an-email
Elijah
I tried that regex and it doesn't cut it because it only validates the address itself, not the address plus phrase, which is what's in the input script. (Also it barfs on domain names that end in ., which is incorrect.)
Rafe
@Rafe: Thank you for that. I use that regex in my current project. Are domains that end in a . only valid on intranets? I don't think there are any on the internet as a whole right?
Elijah
You can append . to the end of any domain name to indicate that the domain name is fully qualified. (That's how you specify domains in DNS config files, for example.) Technically the trailing dot is a reference to the root DNS server.
Rafe
+2  A: 

I wrote a simple Perl script that uses the Email::Address module to validate these addresses:

#!/usr/bin/env perl

use Email::Address;

while (<>) {
    chomp;
    @addresses = split /\;/;

    foreach my $address (@addresses) {
        if (!Email::Address->parse($address)) {
            print $address, "\n";
        }
    }
}

You'll just need to install the module. Its home page is:

http://emailproject.perl.org/wiki/Email::Address

Rafe
A: 

This problem is harder than it appears. When faced with it, I stole the code from the mf.c module in the NMH sources. I then imported the address parser into Lua so I could handle email addresses from scripts.

Using somebody else's code saved me a world of pain.

Norman Ramsey