views:

105

answers:

7

Hi, I'm planing to build some database project.

One of the tables have a lot of attributes.

My question is: What is better, to divide the the class into 2 separate tables or put all of them into one table. below is an example

create table User { id, name, surname,... show_name, show_photos, ...)

or

create table User { id, name, surname,... )
create table UserPrivacy {usr_id,  show_name, show_photos, ...)

The performance i suppose is similar due to i can use index.

A: 

I would suggest something differnet. It seems likely that in the future you will be asked for 'yet another attribute' to manage. Rather than add a column, you could just add a row to an attributes table:

TABLE Attribute
(
    ID
    Name
)
TABLE User
(
    ID
    ...
)
TABLE UserAttributes
(
    UserID FK Users.ID
    Attribute FK Attributes.ID
    Value...
)

Good comments from everyone. I should have been clearer in my response.

We do this quite a bit to handle special-cases where customers ask us to tailor our site for them in some way. We never 'pivot' the NVP's into columns in a query - we're always querying "should I do this here?" by looking for a specific attribute listed for a customer. If it is there, that's a 'true'. So rather than having these be a ton of boolean-columns, most of which would be false or NULL for most customers, AND the tendency for these features to grow in number, this works well for us.

n8wrl
This is a suggestion that will not scale well. EAV tables are known tobe extremely poor performers and in most cases should be avoided.
HLGEM
I fight EAVs like I'm St. George they're a dragon. But N8 is correct, if the ONLY query you want to run is like his, "Should I do this for this user, you're fine. It's a unique index access for one row... Should N8 ever try to find ALL of the customer who have two or more privileges he can go make dinner.
Stephanie Page
A: 

You should consider splitting the table if all of the privacy attributes are nullable and will most probably have values of NULL.

This will help you to keep the main table smaller.

If the privacy attributes will mostly be filled, there is no point in splitting the table, as it will require extra JOINs to fetch the data.

Quassnoi
+2  A: 

I'd say the 2 separate tables especially if you are using ORM. In most cases its best to have each table correspond to a particular object and have its field or "attributes" be things that are required to describe that object.

You don't need 'show_photos' to describe a User but you do need it to describe UserPrivacy.

KThompson
show_photos and show_name are attributes of the User. With ORM, generally all of the attributes will get loaded whether you use them or not. It's possible you might want to load the User without loading their privacy settings. That would be a reason to split them. Though you'd still add the method to the User class like getUserPrivacy(). Still, you're complicating things for performance.
Marcus Adams
You can have all the items in one table, but use views to stop the ORM 'seeing' all the columns at once
Gary
You could yes but doesn't that contradict a little bit? I would assume you would be using ORM in the first place to be organized among other things. Why not extend that to your database?
KThompson
A: 
Keith Williams
Down voted as these do not scale well
Gary
Please do not go this direction. It doesn't scale, it's impossible to query well, it can't be indexed at all, it's a bad bad idea. Just google EAV. It's such a common mal-design pattern that it got its own name.
Stephanie Page
This mal-design pattern is so bad, it has its own name - EAV. Google all the horror stories or look for other posts/comments on SO by me.
Stephanie Page
+4  A: 

It's best to put all the attributes in the same table.

If you start storing attribute names in a table, you're storing meta data in your database, which breaks first normal form.

Besides, keeping them all in the same table simplifies your queries.

Would you rather have:

SELECT show_photos FROM User WHERE user_id = 1

Or

SELECT up.show_photos FROM User u
LEFT JOIN UserPrivacy up USING(user_id)
WHERE u.user_id = 1

Joins are okay, but keep them for associating separate entities and 1->N relationships.

There is a limit to the number of columns, and only if you think you might hit that limit would you do anything else.

There are legitimate reasons for storing name value pairs in a separate table, but fear of adding columns isn't one of them. For example, creating a name value table might, in some circumstances, make it easier for you to query a list of attributes. However, most database engines, including PDO in PHP include reflection methods whereby you can easily get a list of columns for a table (attributes for an entity).

Also, please note that your id field on User should be user_id, not just id, unless you're using Ruby, which forces just id. 'user_id' is preferred because with just id, your joins look like this:

ON u.id = up.user_id

Which seems odd, and the preferred way is this:

ON u.user_id = up.user_id

or more simply:

USING(user_id)

Don't be afraid to 'add yet another attribute'. It's normal, and it's okay.

Marcus Adams
No down-vote Marcus, but I disagree. Adding a column to a table in a production DB is a fairly big deal. As for ID's, well that's been debated quite a bit here on the 'overflow ;)
n8wrl
@n8wrl, Thanks for the input. Depends on the system if it's a big deal. If the system is too busy, you can wait for the next maintenance window, or off peak times. It's preferred to take the one time hit, which scales better than the extra code and work required to manage the other system.
Marcus Adams
n8wrl, adding a column to a large prod table is FAR preferable to using an EAV table. EAV tables are more work to maintain (gee I have no idea how may joins I'll have to do to get the data).and query and are performance killers.
HLGEM
A: 

Since this appears to be a one to one relationship, I would normally keep it all in one table unless:

You would be near the limit of the number of bytes that can be stored in a row - then you should split it out.

Or if you will normally be querying the main table separately and won't need those fields much of the time.

HLGEM
A: 

If some columns is (not identifiable or dependent on the primary key) or (values from a definite/fixed set is being used repeatedly) of the Table make a Different Table for those columns and maintain a one to one relationship.