tags:

views:

127

answers:

4

Background

Gmail allows '.'s and +filters allowing for an infinite number of email addresses all pointing to the same gmail account.

i.e. the following all point to the same gmail account:

Problem

We have a table on our production environment that hold all registered user's data including their email address.

Currently the gmail email addresses on this table contain a mix of the the above variations.

  • Problem 1 - If the user tries to log back in, after they have created their account ,using their a different variation of his email address than we have on record this user will not be found.
  • Problem 2 - The user is able to create different accounts on out site using the many variations of gmail email address.

Possible solutions:


One proposed solution would be to create a function...

CREATE FUNCTION STANDARDIZE_EMAIL (
    @Email varchar(255)
)
RETURNS varchar(255)
AS
BEGIN
    -- we make the email lowercase since email addresses are
    -- case independent 

    SET @Email = LOWER(@Email)

    -- if it is a gmail email address then we remove periods and filters from the username
    IF RIGHT(RTRIM(@Email), 10) = '@gmail.com'
    BEGIN
        -- remove domain
        SET @Email = REPLACE(@Email, '@gmail.com', '')

        --remove periods from username
        SET @Email = REPLACE(@Email, '.', '')

        -- remove '+' and filter
        IF CHARINDEX('+', @Email) > 0 
            SET @Email = SUBSTRING(@Email, 0, CHARINDEX('+', @Email))

        -- add back the domain
        SET @Email = @Email + '@gmail.com'
    END  

    RETURN (@Email)
END

Example use:

SELECT * FROM table
WHERE STANDARDIZE_EMAIL(Email) = STANDARDIZE_EMAIL(@Email)

inb4: Running a process to standardize all the emails currently on the table is not an option as there could be duplicates, and users would lose the gmail +filter functionality

Is standardizing the email address on every record as we search through too expensive on the db?


Another proposed solution is to add a GmailEmail field to the table, and upon registering the user for the first time save a copy of the standardized version of his gmail email address to this second field which can be used for comparisons when they log back in.

Id rather not have to go to this extent if possible.

Anyone have any ideas? I welcome any and all input.

A: 

You could create a computed column that would standardize the email.

I know you said it's not an option, but you may need to take another look at standardizing the column - until you do, you'll have inconsistent data in your database.

Damien Dennehy
+1  A: 

My recommendation would be to create a "standardized" email field, (which starts as empty) using the function you've provided. Upon a user logging in, the system can:

  • Standardize the email they used to login
  • See if the email exists in the "standardized" field
    • If it does, let them in
    • If not, check to see if the "unstandardized" email exists
      • If it does:
        • Mark that style as their preferred email style
        • Create the "standardized" field
        • Let them in
      • If not:
        • Reject the login

This has the advantage of slowly migrating users to the new system, and (transparently) getting their preferred email to use as their login. Note, of course, that users using multiple accounts under variants of their GMail address will not be able to access their other accounts; since you seem to want to prevent this I'm going to call it a feature.

You might want to add a check at the beginning, so if the standardized field exists and the unstandardized field exists (and they don't match) you could inform the user about what happened and deal with it appropriately.

Chris B.
+2  A: 

Is standardizing the email address on every record as we search through too expensive on the db?

Yes. Even if this table takes less than 8 pages of storage space, Yes, it is too expensive.

Do you need to track the email address in all the myriad forms entered by the users? If so, having the second "clean" column is perhaps ugly, but probably necessary. (Creating a calculated column based on your function would perform as poorly as the original soluation.)

If you do not need to maintain the addresses precisly as entered (each time) by the users, why not:

  • Clean the existing data using your function, e.g. UPDATE... set EmailCol = dbo.STANDARDIZE_EMAIL(EmailCol)
  • Apply the rule to clean email addresses whenever they are added to the database
Philip Kelley
+1: Want to standardize the email only while searching? Don't bother to put an index on the column, it won't be used...
OMG Ponies
Thanks, will be going the second clean column route.
phaed
+1  A: 

I'd like to point out that these characters ! # $ % & ' * + - / = ? ^ _ { | } ``~ are all valid in an email address. You'd be introducing a problem for any systems that actually have mailboxes that don't map "+ suffixes" and periods to the same mailbox.

I think it's reasonable for a user to expect [email protected] and [email protected] to be treated as unique addresses.

BC