views:

156

answers:

5

I'm wondering about best practices for user registration. I'm leaning towards storing site registrations in a separate registrations table and then once the signup has been confirmed by email transferring the data over to the users table.

The benefit to this would be that reads from the users table are not cluttered with never activated registrations. Another benefit is that the email (username) field can remain unique in the users table but if you attempt to register with an email address that you don't own, the owner of that email address will still be able to register with it as the email field will not be unique in the registrations table.

I'm wondering if this is a common practice or if there's any reason this is not a good idea?

+1  A: 

I think that would be over engineering.

Store all of them in one table and schedule an SQL query (everyday) to delete not actived accounts older than 30 days.

dr. evil
it doesn't even have to be every day. You can do it once a month or so. You're not going to see a buildup of unactivated accounts slow down your processing unless you are extremely popular or never clear it out. Do it once a month, or once a year or so.
Malfist
+1  A: 

Activated or Not

There's no real need to store a user in a different table. Just store all the users in one table, with a boolean flag indicating if their activated or not. Every so often run a cron job to check and see if there are accounts that are x days old and not activated and delete them.

Email

So presumably you are activating the user from the e-mail right? You also want to set it up so that if another user entered someone else's e-mail that other person could still register correct? That's actually rather simple to do. In that e-mail that is mistakenly sent to them, just have a link to remove that e-mail from your database because obviously it's not them and they're never going to activate the account.

But what if the other guy just deletes it?

Then when they register tell them you already have their e-mail and offer to send the activation again, this e-mail will also have the option of removing it from the database.

OR

Just remove the older account from the database. Stands to reason if the person is trying to activate a new account and there's an old one that never activated you can just remove it.

Malfist
A: 

I agree with this practice for Separation of Concerns reasons: A "registration process" and "user accounts" are two separate concepts.

The registration table would be used to persist the status of the "registration process". Once that process is complete (once the email is validated), the process's "output" would be a new "user account". If the process fails (for example, it "timeouts" after the user fails to confirm his email for 1 month), there's no impact on the "user account" concept.

As was said in another answer, this can be over engineering in some cases, but it can also be a useful way to enapsulate complexity of these two separate concepts. It can also be seen as contradicting normalization principles, but then I believe you have to choose a tradeoff between encapsulation and normalization depending on your context.

ckarras
What if my user had a green car instead of a red one? Would I need to create a separate table to keep the two concepts of different color cars apart? Because a red car is totally different than a green car.
Malfist
@Malfist: Two cars that are only different by their color would still have the same data and behavior, so there's no need to separate them (that would cause copy paste programming). But in the case of a car, I could have separate concepts: The Car itself, Purchase Contract, Car Registration, Insurance Contract, Repairs. When I'm using the Car, the Repair Process doesn't matter, only the end results of the Repair (the car works properly).Also, it may make sense (in some systems) to have separate concepts for Bus, SUV, which would probably use inheritance for common behavior.
ckarras
Perhaps, but if you're a paint shop, you don't want to do a touch up on the car with the wrong paint, so you're going to treat the cars completely differently based on their color. Following your reasoning, you'd need a different table per car color.
Malfist
We store data in a database not only to show it to the user later, but to act on it. Data is not stored in the database to be grouped on how it is acted upon, but rather how it is grouped together in real life. It's similar in the way we do OOP programming (but not the same).
Malfist
not real life, sorry. Data is grouped in a database on how it relates to itself. I wouldn't want tree bark type in my table of cars, but I would want car color, car type, where the keys are, etc. All of these fields I can act differently on. If I'm painting the car, I don't want to have to look for the red table, I want to be able to query the car table for red. Likewise, I don't want to have to look for paper birch bark in the paper birch bark table, but rather in the bark table, or tree table.
Malfist
In real life, if you want to register for a service (for example a Credit Card), there are two different documents involved: the Signup Form, which initiates the appropriate credit checks and account creation processes; then the resulting Credit Card. The things you can do with both are very different (a Credit Card is not simply an "activated" signup form, and you can't pay something by showing the cashier that form). In this analogy, the Signup Form would be similar to a Registration, while the Credit Card is equivalent to a User Account.
ckarras
Right, so you have a flag stating if the Credit Card has been activated or not, you don't create an entirely new table for an object based on a boolean property that it has.
Malfist
The way you store data in the database also depends on your design point of view: Object-Oriented or Data-Oriented. In a OO point of view, concepts are grouped by the associated behavior (if I want to paint red or green, it's still the "paint" behavior with a different "color" parameter). Ideally, you can bridge between the two point of views using ORM tools (keep code behavior oriented while database is purely data oriented), but this is not always an option on all platforms.
ckarras
Take for example a website that has Sections A, B, C and D. Users are allowed to select which section they want to be taken to once they sign in. John wants to be taken to A when he's logged in, and Mary wants to be taken to D. Bill doesn't have access to C or D so Bill just wants to go to B. All these users you will act differently on. Each have different privileges. However, you wouldn't create a tables for each section, and if you did, how would you determine the access privileges of that user? Create another set of tables for privileges, one per section?
Malfist
In the case of the Credit Card, the Form and the Card itself would have more differences than only a boolean "activated" flag:Form has: Form Number, Requested Credit Limit, Credit History at the time of the request (not filled by the requester, but "appended" once the credit checks are done)Credit card has: Credit Card Number (which is created only once the request is authorized), Actual Credit Limit, Account Balance, Transaction/Payment History, Expiration Date. If the request process changes, it should not affect the way the card is being used, so it makes sense to separate them.
ckarras
In this case, possibly, or you could leave them as nulls. But the point being, the user isn't a credit card and it is simply just an activated flag.
Malfist
+1  A: 

Do yourself a favour and keep it a single table. You're just not going to have enough aborted registrations for performance issues to come into play (especially if you wipe them after 3 weeks).

Your justification about the email addresses is silly. Nobody is likely to get their email blocked by a false registration.

SpliFF
Its not a justification, its a reason and I don't really think its that silly to assume someone may register with [email protected] or something similar...
thief
You can get around that, give me a second and I'll update my answer
Malfist
+1  A: 

There may be one case in which you'd logically split the two tables: If your registration process asks a lot of mandatory user information only after you've confirmed the e-mail address. For example, if your pending registration table would wind up with only a few columns, such as e-mail and activation key whereas your users table has many additional columns, such as username, first name, last name, postal address, etc.

The split might make sense in that case, as you'd be able to declare those columns NOT NULL.

Other than that, though, I have to agree with most of the other answers; it sounds like premature optimization to me. Multiple tables with the same structure is a strong warning sign that ur doin it wrong. Not definite, but a strong warning.

derobert