views:

63

answers:

3

A while back I posted a different question regarding column order. While this question does not relate to column order, I was suggested to make my table differently from how I was making it.

Lets say I am selling 100 products. Some of these products are compatible with each other, some are not. Some have not been tested yet (I did not mention this part in my last question).

Would I be better off making a mySQL table like:

NAME         PRODUCT1     PRODUCT2     PRODUCT3     PRODUCT4 ....
product1     yes          no           maybe        yes
product2     maybe        yes          no           no
product3     maybe        yes          no           no
product4     maybe        yes          no           no
...

or making the table like:

FIRST       SECOND      COMPATIBLE?
Product1    Product1    Yes
Product1    Product2    Yes
Product1    Product3    No
Product1    Product4    Maybe
Product2    Product1    Maybe
Product2    Product2    Maybe
Product2    Product3    No
Product2    Product4    Maybe
Product3    Product1    Yes
Product3    Product2    Yes
Product3    Product3    No
Product3    Product4    Yes
Product4    Product1    Yes
Product4    Product2    No
Product4    Product3    No
Product4    Product4    Maybe

I was told that the second method would be better, but I failed to mention that there was also the "maybe" option (and not purely yes/no), meaning the third column would have to be added to the second table.

As an inexperienced mySQL'er, I ask, which table would be more efficient, more maintainable, and which would you recommend?

Thanks

+3  A: 

The second option is still better (even with the third column, which is no problem), because it allows you to easily add new product types without modifying the tables. (The technical term for this is that the schema is better"normalized"). This is much more maintainable. In addition, it means you can join across this table much more easily, or do queries to answer "which is the lowest-cost product which product 1 is compatible with" which would be very hard to do with the first table.

njk
I hope you don't mind another question. I already have the second table laid out but not filled yet. What I mean by this is: every product is compatible with itself, so I have it filled out for every product which is compatible with itself. I do not have any compatibilities while mixing products yet.Is there any way to quickly fill out every product in SECOND for every non-unique FIRST?I hope you understand my poor wording
thomas
@thomas: that would probably be best done by writing code that does inserts while reading the list of existing products from... something?
FrustratedWithFormsDesigner
@thomas, glad I could help... Please ask your second question as another question on StackOverflow, rather than in the comments :) You can post the link here once you have!
njk
You might consider representing Compatible with a floating point value in the range 1.0 to 0.0. Then you'd be able to handle more gradations of "maybe" if you needed to in a nice 1990s fuzzy-logic kind of way. Other than that, this is the way to go. We do a lot of this kind of thing in my shop, and it performs well and isn't that hard to maintain.
Ollie Jones
+2  A: 

Stick with the second option. If you ever add or remove products you only affect some rows. If you use the first option and add/remove products, you are changing the structure of the table.

FrustratedWithFormsDesigner
A: 

There is a third option - a "semi-normalized" form. In this case, you'd have a structure something like this:

Product  DefinitelyCompatible  MaybeCompatible
----------------------------------------------
 1        '2, 3, 4'             '5'
 2        '1, 4'                '3'
 3        '1'                   '3, 4'
 4        '1, 2'                '3'
 5        ''                    '1'

Where the first field is the product ID, and the second and third fields are strings that contain lists of IDs. Using this structure, you can use the FIND_IN_SET() MySQL function to search through the lists, or if you want to look up a given product's compatibilities, you only need fetch one row and then split the strings yourself.

This allows you to have only as many rows as you have products (in a Nx3 table), while also minimizing the number of rows you need to alter to add a new product (or update an existing one).

A further note on performance - assuming that the relationships are symmetric (if a -compat-> b, then b -compat-> a), you don't even need to use FIND_IN_SET() - you can just fetch whichever object (a or b) you need to get the lists for (if you need everything that's compatible with 23, then you fetch 23's row; if you want to see if 4 is compatible with 5, you can fetch either one's row and see if the other appears in it). FIND_IN_SET() would then only be necessary if the check is a portion of a subquery, rather than in your code.

Amber
This is most definitely not the best way to achieve this association, especially in a product database like this one. Perhaps in data warehouse scenarios this solution would be work, but njk below points out but one of many problems with this design.
WCWedin
This might work OK, but I'd be concerned that the `DefinitelyCompatible` and `MaybeCompatible` might not index well. Whether or not that matters depends on the queries. Also, I'm not sure how portable this is to other platforms that don't have a `FIND_IN_SET()` function.
FrustratedWithFormsDesigner
Actually, now that I think about it, it might never be necessary to index `DefinitelyCompatible` and `MaybeCompatible`. Join queries might be tricky though.
FrustratedWithFormsDesigner
that's true, I wasn't considering queries
thomas
Yes, a fully normalized form is almost always the best for joins. Which form is best for a given application depends on the context; some applications are far more likely to require joins than others. Not all logic is best done in the DB.
Amber
Storing delimited data in a database is almost always a bad idea, especially for something as simple as this. What's the advantage? Fewer rows?
njk
Sometimes, it's more efficient to just grab all of the items than it is to ask the DB engine to do a join every time. If the data model you're working with is one where you're almost always going to be asking only "what things are associated with item X", and not doing the fancier queries that normalization assists with, delimited data can be far more efficient. Sure, when you have a database engine hammer, everything looks like a join nail, but that's not always true.
Amber