views:

53

answers:

4

Hi all,

Let's say I need to store t-shirts and jeans in my products table. They are sharing columns like cost, price, quantity but what about waist(only apply to jeans) and size(only apply to tees). Should I put them all in one table or separate tables? If I put them together, I will have a lot of null values, but if I separate them, what's the best way to link them?

Thank you very much.

+2  A: 

You could have a products_attributes table, where waist may be 5 and size may be 8 (depending on how it is set up). This also allows you to easily add new attributes in the future (or even have the ability for the end user to add new attributes).

Then you could have columns like

| attribute_id | value | product_id |
=====================================
| 5            | "30cm"| 28         |
=====================================
alex
Beat me by 1 sec! Damn quickly proof reading!! :)
PostMan
@PostMan Haha, that happens to me all the time!
alex
But then the all the values will be stored in string. Don't you think it's not efficient in terms of searching and sorting?
Dreteh
@user429147 That was just an example. You can store them as integers.
alex
The problem with this solution is your assuming all product attributes follow a particular data type (value column). what if one attribute is size (fine, store as int), but then what about something like sizeabbreviation (nvarchar, eg XXL). he would need to either add more columns (and possibly break other refering tables/keys), or have the column NVARCHAR(MAX) or something, then store all data types and cast them, not ideal. i could forsee problems.
RPM1984
@RPM1984 Yep, you are right. Jeans and shirts (and similar apparel) should be OK though. You could also map an integer to colour.
alex
or you could store it as a string, and parse it in code?
PostMan
@PostMan - come on dude. =) Why start out making life difficult?
RPM1984
@RPM1984 Hey, if it was hard to code, it should be hard to understand! :)
PostMan
@RPM1984 Casting a string to int than find the largest number is very inefficient. Don't you think so?
Dreteh
@user429147 That is what he was getting at.
alex
@RPM1984 - yes, i do think it is, hence my comments, and my alternative answer =)
RPM1984
Haha, my bad. My comment was meant for PostMan.
Dreteh
But what if the requirements change, and you now only have M / L / XL? Or a color attribute, or even a logo attribute. Not everything applies to numbers. Just trying to plan ahead
PostMan
+1  A: 

You could add a product attribute table, which would be one product to many attributes.

Each attribute would have a type and value, this would solve your problems :)

PostMan
see my below comment to @alex's answer. it would work, but would only work if all 'attributes' are same data type.
RPM1984
+2  A: 

I'd go with the following:

Product

ProductID INT IDENTITY,
Cost DECIMAL(4,2),
Price DECIMAL(4,2),
Quantity INT

Jean

ProductID INT,
Waist INT

Shirt

ProductID INT,
Size INT

You can then make ProductID on Jean/Shirt tables a Foreign Key to the ProductID column on Product.

This way, you are extending the core Product attributes to suit more specific Products.

What's more, you can add more specific Products (new tables), without affecting the existing Products or Product table schema.

We are currently implementing a similar structure, so that our Application ORM can support "inheritance" for the entities.

To get Jeans, your query would be:

SELECT Product.ProductID, Product.Cost, Product.Price, Product.Quantity, Jean.Waist
FROM Product Product
INNER JOIN Jean Jean
ON Product.ProductID = Jean.ProductID

Of course, if you're looking for a simple change, the below answers will be fine.

But this is 'future-proofing' your database for future Product additions.

HTH

RPM1984
This is what I have in my mind too. Just want to make sure this is the best solution. Thank you.
Dreteh
This is the standard answer for "generalization specialization relational modeling". It looks good to me.
Walter Mitty
+1  A: 

There are several ways to approach this.

You've described one, single-table inheritence (where you have a bunch of null columns for irrelevant attributes).

RPM1984's answer suggests class-table inhertience, where shared data goes in a main table, and each "type" gets its own table for the extra attributes.

Or, you could have Entity/Attribute/Value, as suggested by alex and postman.

Each method has pros and cons. The biggest issue is that it's hard for RDMBSes to enforce referential integrity on any of them.

It's worth it to think very hard about your problem before making a decision. EAV provides the most flexibility (you can assign arbitrary attributes, and don't ever have to change your schema), but you end up doing a lot of processing in your application layer. And, you can't easily get a single result row with all the attributes.

If I'm pretty sure that I know every "type" that I need to model, and that I'm not likely to need to create a lot more in the future, I'd go with class-table inheritance.

With class-table inheritance, you can filter by type using a simple inner join (select * from products inner join pants) will select only pants, for example.

timdev
Thanks for the detailed explanation. I prefer class-table inheritance too.
Dreteh