views:

133

answers:

10

Hi,

I'm designing a website using PHP and MySQL currently and as the site proceeds I find myself adding more and more columns to the users table to store various variables.

Which got me thinking, is there a better way to store this information? Just to clarify, the information is global, can be affected by other users so cookies won't work, also I'd lose the information if they clear their cookies.

The second part of my question is, if it does turn out that storing it in a database is the best way, would it be less expensive to have a large number of columns or rather to combine related columns into delimited varchar columns and then explode them in PHP?

Thanks!

A: 

The database a perfectly fine place to store such data, as long as they're variables and not, say, huge image files. The database has all the optimizations and specifications for storing and retrieving large amounts of data. Anything you set up on file system level will always be beaten by what the database already has in terms of speed and functionality.

would it be less expensive to have a large number of columns or rather to combine related columns into delimited varchar columns and then explode them in PHP?

It's not really that much of a performance than a maintenance question IMO - it's not fun to manage hundreds of columns. Storing such data - perhaps as serialized objects - in a TEXT field is a viable option - as long as it's 100% sure you will never have to make any queries on that data.

But why not use a normalized user_variables table like so:

id  | user_id | variable_name | variable_value

?

It is a bit more complex to query, but provides for a very clean table structure all round. You can easily add arbitrary user variables that way.

If you are doing a lot of queries like SELECT FROM USERS WHERE variable257 = 'green' you may have to stick to have specific columns.

Pekka
+2  A: 

In my experience, I'd rather get the database right than start adding comma separated fields holding multiple items. Having to sift through multiple comma separated fields is only going to hurt your program's efficiency and the readability of your code.

Also, if your table is growing to much, then perhaps you need to look into splitting it into multiple tables joined by foreign dependencies?

Aaron
A: 

I wouldn't go for the option of grouping columns together and exploding them. It's untidy work and very unmanageable. Instead maybe try spreading those columns over a few tables and using InnoDb's transaction feature.

If you still dislike the idea of frequently updating the database, and if this method complies with what you're trying to achieve, you can use APC's caching function to store (cache) information "globally" on the server.

sombe
A: 

MongoDB (and its NoSQL cousins) are great for stuff like this.

Chris
+2  A: 

I'd create a user_meta table, with three columns: user_id, key, value.

ceejayoz
While I like this approach for its simplicity, as I said before, I'm not sure it'll work for cases (which are most of them) where I need the data from a large number of columns at once (which is most of the time). Or am I mistaken?
Alex
Properly indexed, a query like `SELECT key, value FROM user_meta WHERE user_id=1 AND key IN ('homepage', 'background_color', 'arm_length', 'bacon_lover');` should run pretty well. Once it starts to slow down, you could consider partitioning the database, where users 1-100,000 are on one DB, 100,001-200,000 are on another, and so on.
ceejayoz
Somewhere, E.F. Codd is rolling over in his grave.
Breton
It's a model that works well for WordPress.com - a pretty huge database.
ceejayoz
A: 

The database is definitely the best place to store the data. (I'm assuming you were thinking of storing it in flat files otherwise) You'd definitely get better performance and security from using a DB over storing in files.

With regards to the storing your data in multiple columns or delimiting them... It's a personal choice but you should consider a few things

  1. If you're going to delimit the items, you need to think of what you're going to delimit them with (something that's not likely to crop up within the text your delimiting)
  2. I often find that it helps to try and visualise whether another programmer of your level would be able to understand what you've done with little help.
  3. Yes, as Pekka said, if you want to perform queries on the data stored you should stick with the seperate columns
  4. You may also get a slight performance boost from not retrieving and parsing ALL your data every time if you just want a couple of fields of information

I'd suggest going with the seperate columns as it offers you the option of much greater flexibility in the future. And there's nothing worse than having to drastically change your data structure and migrate information down the track!

Ganesh Shankar
A: 

I would recommend setting up a memcached server (see http://memcached.org/). It has proven to be viable with lots of the big sites. PHP has two extensions that integrate a client into your runtime (see http://php.net/manual/en/book.memcached.php).

Give it a try, you won't regret it.

EDIT
Sure, this will only be an option for data that's frequently used and would otherwise have to be loaded from your database again and again. Keep in mind though that you will still have to save your data to some kind of persistent storage.

aefxx
Thanks, I'll look into this as well.
Alex
A: 

A document-oriented database might be what you need.

If you want to stick to a relational database, don't take the naïve approach of just creating a table with oh so many fields:

CREATE TABLE SomeEntity (
    ENTITY_ID    CHAR(10)    NOT NULL,
    PROPERTY_1   VARCHAR(50),
    PROPERTY_2   VARCHAR(50),
    PROPERTY_3   VARCHAR(50),
    ...
    PROPERTY_915 VARCHAR(50),
    PRIMARY KEY  (ENTITY_ID)
);

Instead define a Attribute table:

CREATE TABLE Attribute (
    ATTRIBUTE_ID  CHAR(10) NOT NULL,
    DESCRIPTION   VARCHAR(30),
    /* optionally */
    DEFAULT_VALUE /* whatever type you want */,
    /* end_optionally */
    PRIMARY KEY   (ATTRIBUTE_ID)
);

Then define your SomeEntity table, which only includes the essential attributes (for example, required fields in a registration form):

CREATE TABLE SomeEntity (
    ENTITY_ID   CHAR(10) NOT NULL
    ESSENTIAL_1 VARCHAR(30),
    ESSENTIAL_2 VARCHAR(30),
    ESSENTIAL_3 VARCHAR(30),
    PRIMARY KEY (ENTITY_ID)
);

And then define a table for those attributes that you might or might not want to store.

CREATE TABLE EntityAttribute (
    ATTRIBUTE_ID    CHAR(10) NOT NULL,
    ENTITY_ID       CHAR(10) NOT NULL,
    ATTRIBUTE_VALUE /* the same type as SomeEntity.DEFAULT_VALUE;
                       if you didn't create that field, then any type */,
    PRIMARY KEY     (ATTRIBUTE_ID, ENTITY_ID)
);

Evidently, in your case, that SomeEntity is the user.

Eduardo León
A: 

Instead of MySQL you might consider using a triplestore, or a key-value store that way you get the benifits of having all the multithreading multiuser, performance and caching voodoo, figured out, without all the trouble of trying to figure out ahead of time what kind of values you really want to store.

Downsides: it's a bit more costly to figure out the average salary of all the people in idaho who also own hats.

Breton
A: 

depends on what kind of user info you are storing. if its session pertinent data, use php sessions in coordination with session event handlers to store your session data in a single data field in the db.

jellyfishtree