views:

585

answers:

5

How can I check for duplicate email addresses in PHP, with the possibility of Gmail's automated labeler and punctuation in mind?

For example, I want these addressed to be detected as duplicates:

         [email protected]
        [email protected]
   [email protected]
  [email protected]

Despite what Daniel A. White claims: In Gmail, dots at random places before the '@' (and label) can be placed as much as you like. [email protected] and [email protected] are in fact the same user.

+2  A: 

Strip the address to the basic form before comparing. Make a function normalise() that will strip the label, then remove all dots. Then you can compare the addresses via:

normalise(address1) == normalise(address2)

If you have to do it very often, save the addresses in the normalised form too, so you don't have to convert them back too often.

viraptor
Beat me to it. :)
musicfreak
+3  A: 
$email_parts    = explode('@', $email);

// check if there is a "+" and return the string before
$before_plus    = strstr($email_parts[0], '+', TRUE);
$before_at      = $before_plus ? $before_plus : email_parts[0];

// remove "."
$before_at      = str_replace('.', '', $before_at);

$email_clean    = $before_at.'@'.$email_parts[1];
powtac
You likely only want to remove the "."s from GMail addresses. As mentioned in the comments on the OP, most mail providers will recognise "." as a valid character, making the addresses different.
Brenton Alker
You are right if you are running this code on other addresses than the gmail ones. Just check after line 1 of my script if (in_array($email_parts[1], 'gmail.com', 'googlemail.com')) { // run the rest of the code...}
powtac
This, including powtac's addition would do the trick.
Kriem
A: 
function normalize($input) {
     $input = str_replace('.', '', $input);
     $pattern = '/\+(\w+)@/';
     return preg_replace($pattern, '@', $input);
}
Seth
`$input = str_replace('.', '', $input);` seems a bit brutal. It will make "[email protected]" and "[email protected]" the same..
dbr
+2  A: 

Perhaps this would be better titled "How to normalize gmail addresses in PHP, considering ([email protected])"

You have two technical solutions above. I'll go a different route and ask why you're trying to do this. It doesn't feel right to me. Are you trying to prevent someone registering multiple times at your site using different e-mail addresses? This will only prevent a specialized case of that.

I have my own domain, example.com, and any e-mail that goes to any address at that domain goes to my single mailbox. Do you, now, want to put a check to normalize anything at my example.com to a single address on your end?

By the official e-mail address format, those addresses you are trying to match as the same are different.

James Cassell
+1  A: 

Email address parsing is really, really hard to do correctly, without breaking things and annoying users..

First, I would question if you really need to do this? Why do you have multiple email addresses, with different sub-addresses?

If you are sure you need to do this, first read rfc0822, then modify this email address parsing regex to extract all parts of the email, and recombine them excluding the label..

Slightly more.. practically, the Email Address wikipedia page has a section on this part of the address format, Sub-addressing.

The code powtac posted looks like it should work - as long as you're not using it in an automated manner to delete accounts or anything, it should be fine.

Note that the "automated labeler" isn't a GMail specific feature, Gmail simply popularised it.. Other mail servers support this feature, some using + as the separator, others using -. If you are going to special-case spaces in GMail addresses, remember to consider the googlemail.com domain also

dbr