views:

276

answers:

4

I'm in the process of working on a user profile system for a website and am pondering what would be a better (scalable) approach to take. I've come up with two solutions and am looking for either input or perhaps pointers to something I might have missed.

The following create table statements are not meant to be executable but are merely there to give an idea of the layout of the tables involved.

My initial thought was something like this:

CREATE TABLE user(
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,

    user_email VARCHAR(320),

    user_joined DATATIME,
    user_last_seen DATATIME,

    user_name_first VARCHAR,
    user_name_last VARCHAR,

    user_name_alias VARCHAR,

    user_location_country VARCHAR,
    user_location_region VARCHAR,
    user_location_city VARCHAR

    # ...
);

Obviously this isn't very scalable at all and adding additional properties i annoying. The one advantage is I can quickly search for users matching a specific set of properties. I've done a bit of looking around and this is a pretty common approach (e.g. Wordpress).

My second approach (the one I'm currently playing around with) is much more scalable but I'm a little concerned about performance:

CREATE TABLE user(
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,

    user_email VARCHAR(320)
);

CREATE TABLE user_profile(
    user_id INT UNSIGNED NOT NULL,

    visibility ENUM('PRIVATE', 'PUBLIC'),

    name VARCHAR,
    value VARCHAR
);

Using this approach every use has a set of key value pairs associated with it which makes adding additional properties trivial as well as loading the users profile when they login. However I lose all the type information I had in the first approach (e.g. DATETIME is now stored as a formatted string) so some searches become annoying. This does give me more control over selecting which properties the user wants publicly displayed.

Would a hybrid approach be better allowing me to balance the advantages and disadvantages of both methods? What method does SO use? Is there another approach to this that I haven't thought of or missed?

Extension: With a hybrid approach would it be advantageous to also insert the properties from the user table into the user_profile table to control their visibility to other users or could that possibly be seen as additional overhead?

+2  A: 

I'd use a hybrid approach. Some basic properties such as username, email, lastlogindate etc should be added to your user table. Items of secondary importance can be added as key/value pairs.

This way you can still easily search for the most essential information and keep adding profile items without schema changes.

ifwdev
A: 

I'd go with hybrid solution too, for reasons of performance and design scalability.

I tend to feel that tables like users (I also like plural on table names) need to be broken into a core set of data commonly acted on by other objects, and those extended bits of basically write-only data in the spec like "region", "middleinitial", "shoesize" can be shifted into an extensible and less frequently updated area.

annakata
+2  A: 

The hybrid solution is not a good one. Essentially you are storing additional properties into a property bag table. That is going to make doing reporting and queries complicated in the long run. Also, storing dates, int, decimal, ntext, etc. as varchar is not going to be an acceptable exchange of performance for scalability. How would you create relations off that table, should the need arise?

A better approach is to have a user table for user information. Then as your needs expand create new classes that represent the new features. Those new classes will likely have corresponding tables. That way your "user" class doesn't expand exponentially when properties associated with a user belong in their own space. Yes, in the future you might really have a new property that belongs in the user table. At that point you'll need to go back and adjust your schema and DBAL, but that's the price of code that is easy to understand.

In your example you have address information for the user in the first user table. One thing I do is I know I'll need to store addresses not just for users. So I'll have a separate Address table then include a nullable AddressId in the User table. That way when I have a Stores table, an Events table, I can include AddressId relations there too. A side-effect of that approach is that when I go back and add lat/long to the Address object, everyone in my data model gets those new properties too.

DavGarcia
It strikes me that (and correct me if I'm wrong) that this approach would be subject over time to 'table bloat' with lots of tables getting added to the database to add different functionality?
Kevin Loney
I'm not sure that table bloat is really a problem. If you have an application with 1000 feature points but only 5 tables, I'd be very skeptical about the normalization of the tables. Martin Fowler discusses the table module design pattern here: http://martinfowler.com/eaaCatalog/tableModule.html
DavGarcia
A: 

Why not have an XML field to store the additional information that isn't essential.

This can be configured in the config file and you could even take it a step further and generate the UI controls from the config.

XML is the wrong tool for this, it would result in incredibly ridiculous queries if I need to do anything with that unessential information. I would have to parse the XML for each result before I could do anything which would kill performance.
Kevin Loney
Microsoft use the same approach for their membership provider. Have a look at it. There is also similar solutions of the provider as the guys have explained above. You are probaly right about performance so it is recommended that you only store information in the XML you wouldn't want to query.