tags:

views:

279

answers:

7

I can't believe I couldn't find this on Google, but I actually looked. Is there function in PHP that can normalize an email address? eg if case isn't significant, then [email protected] -> [email protected]. I don't know the rules for when email addresses should be considered "the same", so I don't want to implement this myself.

+3  A: 

Use strtolower() to make the server portion lowercase. (Updated due to previous answer)

$parts = explode("@", $email);
$host = strtolower($parts[1]);
$email = $parts[0]."@".$host;

Also, if you want to standardize the format aswell, you probably want to look into filter_var(), which can sanatize/validate email addresses, along with several other formats.

First, the FILTER__SANATIZE_EMAIL will make sure that there are no illegal characters in it.

$email_sanatized = filter_var('[email protected]', FILTER_SANATIZE_EMAIL);

Then, FILTER_VALIDATE_EMAIL will make sure it is in a valid email format

$email = filter_var($email_sanatized, FILTER_VALIDATE_EMAIL);
Chacha102
+1 for filter_var()
Val
PHP's builtin email validation filter just uses a regular expression (a very simple one, too), I really wouldn't trust it. See here for the reasons why e-mails cannot be verified with regex: http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses
soulmerge
Very interesting.
Chacha102
A: 

If lowercasing is all you're looking for: strtolower().

deceze
I mentioned case just as an example. As others have pointed out, case can be significant in the local part. Whether it is for a particular address varies...
allyourcode
A: 

Trim out all whitespace, then compare with strtolower. That should be perfectly fine.

Matt Grande
Technically, no, that is not "perfectly fine". Whitespace can occur in valid email addresses, although in practice probably almost no one actually does that.
Andrew Medico
I should have specified... When I said trim, I was referring to whitespace at the start or end of the email, which isn't valid. That being said, is the quoted-string-email-address still used by any hosts? I just tried signing up for one at a few places, and it wasn't accepted as valid, and none of the form validations I found take that into account. So, what I said stands true, that *should* be fine for all but the most extreme cases.
Matt Grande
A: 

EDIT: Based on another answer that states only the domain is case insensitive I've updated the function to only lowercase the domain not the user.

function NormalizeEmail( $email )
{
    list( $user, $domain ) = explode( '@', trim( $email ) );
    return $user . '@' . strtolower( $domain );
}
Kane Wallmann
+7  A: 

Wikipedia has a roundup of what the various RFCs say about how an email address should be formed.

Despite what others have said, email can be case sensitive

The local-part is case sensitive, so "[email protected]" and "[email protected]" may be delivered to different people. This practice is discouraged by RFC 5321. However, only the authoritative mail servers for a domain may make that decision. The only exception is for a local-part value of "postmaster" which is case insensitive, and should be forwarded to the server's administrator.

The local part is referring to the part of the address to the left of the @ sign.

So, as far as your specific concern (case normalization), you could lowercase the server portion (to the right of the @) however you best see fit (split by the @, strToLower the server component, recombine).

Alan Storm
+1 for referencing the RFC. Store and use email addresses in their original form; compare them in lowercase -- because while you don't want to accidentally fail to send someone email because you screwed up the local part, you also don't allow registration of two separate accounts for [email protected] and [email protected].
Frank Farmer
I would like to know what percentage of mail servers actually allow 2 users with the same user name but different case. My guess is close to 0. So do you account for minute amount of the mail servers and leave your system open to multiple accounts with the same email or do you make a stand and shut off those 2 (probably less) people who are stupid enough to create multiple users with the same email address? I know what I'll be doing.
Kane Wallmann
Your edge case is someone's existence.
Alan Storm
"I would like to know what percentage of mail servers actually allow 2 users with the same user name but different case" The more likely issue, is a server that accepts ONLY uppercase 'local parts'. I'm pretty sure there was at least one service that did this in the 90s -- for example, [email protected] might have worked, but [email protected] might not have. As I outlined in my first comment, it's simple to account for this, while simultaneously not allowing multiple registrations with the "same" email address. Why risk cutting off a few users, other than out of sheer laziness?
Frank Farmer
+2  A: 

If you want, you can use strtolower(), which could cover most of your emails correctly. But here is some additional info, if you want to do it correctly:

An email address consists of two parts: a local-part (anything before @), and a domain (anything after @). The local-part is meant to be interpreted by the mail server of the domain given in the domain part, so you actually cannot make any assumptions on that (case matters, for example!).

Many mail servers provide the option of adding arbitrary comments to your user name with a plus sign, like the following:

soulmerge+this_mail_is_delivered_to_the_user_soulmerge@example.com

For one mail server [email protected], [email protected] and [email protected] might be the same mail box, whereas in another it might point to two or three distinct mailboxes, but fact is: you cannot know. Any translation you make on the whole address might lead to an invalid address.

soulmerge
I like the comment feature you mentioned. I did not know that!
allyourcode
+1  A: 

watch out this before vote down....:)

this is just a complement of other answers.

in the case of gmail, I would remove the dots on the left side.

Gmail allows only one registration for any given username. Once you sign up for a username, nobody else can sign up for the same username, regardless of whether it contains extra periods or capital letters; those usernames belong to you. If you created [email protected], no one can ever register [email protected], or [email protected]. Because Gmail doesn't recognize dots as characters within usernames, you can add or remove the dots from a Gmail address without changing the actual destination address; they'll all go to your inbox, and only yours.

so you can sure you always have the same gmail email.

Gabriel Sosa
You shouldn't trust that... I know for a fact that another guy has the exact same gmail username as me except for a dot before the last character (he has it, I don't). During the years it has triggered a few bugs (like me receiving his mail, and probably the other way around) but both our accounts are still alive and kicking. I don't know how it happened but I think I registered first.
Fredrik
I know what you are referring to. many gmail users have reported this problem. regards
Gabriel Sosa
And I don't know if it is a good idea to start adjusting your code for every mail provider out there. Besides, gmail might remove or change that 'feature' in the future.
soulmerge
I think your implementation can be smart and tell the user if the system "detects" the gmail address is the same. Anyway all depends how big you think your db will be in terms of how you will store those emails
Gabriel Sosa
You could also account for gmail's "+" feature. [email protected] is delivered to [email protected]. But as mentioned above, the wisdom of trying to compensate for gmail's features in your code is questionable.
Frank Farmer