views:

657

answers:

8

Hi. I have some products that belongs to the some category.

Each category can have different properties.

For example,

  • category cars has properties color, power, ...
  • category pets have properties weight, age, ...

Number of categories is about 10-15. Number of properties in each category is 3-15. Number of products is very big.

Main requirement for this app is very good search. We will select category, and enter criteria for each property in this category.

Have to design database for this scenario. (SQL Server 2005)

+7  A: 

If the user of the application has to select a category before they can search, I would separate your products into different database tables by category. This solution is also indicated by the fact that the categories themselves have so little in common. Breaking it down by category will also make each search much faster, since time won't be wasted searching through cars when your user is looking for a pet.

Once you have the products split up in to categories, it should be easy to create the tables using the common properties of the products in each category. The user interface of your application should be dynamic (I'm thinking of a web form), in that the properties the user can choose from should change when the user selects a category.

Please note that if you have products that you want listed in multiple categories, this solution will result in duplicate data in your tables. There is a trade-off between speed and normalization when designing a database. If you don't have products that fit in multiple categories, then I think this will be the fastest solution (in terms of search speed).

Bill the Lizard
Ouch, I have to disagree on this one. Anytime you need to aggregate product data including category assignment info the queries will require joins of all 10-15 tables, plus all other supporting tables. Given, this issue could be minimized with a separate schema for reporting but Tomalak's answer is more scalable and could easily support aggregation for reporting.
Cory House
The OP stated that "Main requirement for this app is very good search." If you optimize for back-end reporting, isn't that ignoring this requirement?
Bill the Lizard
+1  A: 

You might want to consider an Entity-Attribute-Value type of arrangement, where you can "tag" each product with arbitrary name/value pairs of attributes.

Paul Dixon
+1  A: 

You can try this. I'm not too sure of the actual details of your question, maybe someone can help you translate a little better.

5 tables. 3 for storing the data, 2 for storing the mappings between data.

tProduct 
  productID
  <other product details>

tCategory
  categoryID
  <other category details>

tProperty
  propertyID
  <other property details>

tProductXCategory
  productyID
  categoryID

tCategoryXProperty
  categoryID
  propertyID

Your queries will need to join the data using the mapping tables, but this will allow you to have different many to many relationships between category, properties, and products.

Use stored procedures or parameterized queries to get better performance out of your searches.

StingyJack
Why would you need tProductXCategory and tCategoryXProperty?
xsl
In tProductXCategory, propertyID should be productID.
Ates Goral
thanks Ates... I am not the best typer. @XSL, the X tables are the relational tables. Thats how you associate a category with a property, or a product to a category.
StingyJack
@Ates: I was referring to the previous version of the post where both tables would contain the same columns.
xsl
A: 

If you want to be flexible on your categories and properties, you should create following tables:

  • product: ProductID
  • category: CategoryID, ProductID
  • property: PropertyID, CategoryID

when you want to share a category over mroe than one product, you have to create a link table for the n:m join:

  • productCategoryPointer: ProdCatID, ProductID, CategoryID.

You will have to to some joins in your queries, but with the right indexes, you shoulb be able to query your data fast.

Jan
+10  A: 

The classic design approach would be (the star denotes the primary key column):

Product
  ProductId*
  CategoryId: FK to Category.CategroyId
  Name

Category
  CategoryId*
  Name

Property
  PropertyId*
  Name
  Type

CategoryProperty
  CategoryId*: FK to Category.CategoryId
  PropertyId*: FK to Property.PropertyId

ProductProperty
  ProductId*: FK to Product.ProductId
  PropertyId*: FK to Property.PropertyId
  ValueAsString

If you can live with the fact that every property value would go to the DB as a string and type conversion info is stored in the Property table, this layout would be enough.

The query would go something like this:

SELECT
   Product.ProductId,
   Product.Name AS ProductName,
   Category.CategoryId,
   Category.Name AS CategoryName,
   Property.PropertyId,
   Property.Name AS PropertyName,
   Property.Type AS PropertyType,
   ProductProperty.ValueAsString
FROM
   Product 
   INNER JOIN Category         ON Category.CategoryId = Product.CategoryId
   INENR JOIN CategoryProperty ON CategoryProperty.CategoryId = Category.CategoryId
   INNER JOIN Property         ON Property.PropertyId = CategoryProperty.PropertyId
   INNER JOIN ProductProperty  ON ProductProperty.PropertyId = Property.PropertyId
                                  AND ProductProperty.ProductId = Product.ProductId
WHERE
   Product.ProductId = 1

The more WHERE conditions you supply (conjunctively, e.g. using AND), the faster the query will be. If you have properly indexed your tables, that is.

As it is, the solution is not ideal for a full text indexing situation. An additional table that stores all the text associated with a ProductId in a more denormalized way could help here. This table would need updating through triggers that listen for changes in the ProductProperty table.

Tomalak
A: 

Hi,

You could try something more object oriented.

1. Define a base table for Products

Products(ProductID, CategoryID, <any other common properties>)

2. Define a table Categories

Categories(CategoryID, Name, Description, ..)

From here you have a lot of options and almost all of them will break the normalization of your database.

Solution A.

Will be a maintaince nightmare if you need to add new products

A1. Define a separate table for each of the categories

Cars(CarID, ProductID, ..) Pets(PetID, ProductID, ..)

A2. Join the tables based on the relationships in order to use the data

SELECT <fields> FROM Cars INNER JOIN Products ON Cars.ProductID = Products.ProductID

Solution B.

Maintainance nightmare for different types of properties (i.e. int, varchar, etc)

B1. Define a table for Properties

CategoryProperty (CPID, Name, Type)

B2. Define a table to hold the associations between Categories and the Properties

PropertyAssociation (CPID, PropertyID)

B12. Define a table to hold the properties (Alternative for B1 and B2)

Properties(CategoryID, PropertyID, Name, Type)

B3. For each type of property (int, double, varchar, etc.) add a value table

PropertyValueInt(ProductID, CPID, PropertyID, Value) - for int PropertyValueString(ProductID, CPID, PropertyID, Value) - for strings PropertyValueMoney(ProductID, CPID, PropertyID, Value) - for money

B4. Join all the tables to retreive the desired property.

By using this approach, you will not have to manage all the properties in separate table, but the value types of them. Basically all the tables involved will be lookup tables. The disadvantage, is that, in order to retreive each value, you have to "Case" for every value type.

Take in mind these articles (here and here) when choosing any of these approaches. This forum post is also interesting and somehow related to the subject, even though it is about localization.

You could also use Tomalak's answer and add strong typing if you feel the need.

Bogdan Maxim
Correction: Option A will be a maintenance nightmare if you add a lot of new product categories. If you add new products that belong to existing categories, there's no problem.And the phrase "maintenance nightmare" is overstating the case in any event.
Walter Mitty
I must've missed a word. Oh well.. By the time I've written the answer the question was already closed.
Bogdan Maxim
A: 

hi there I recently had to do this and I m using NHibernate where i have three entities

Product Category Option OptionCategory

A product has 1* Categories

A product has 1* Option

An Option has 1 OptionCategory

once this is set up you can use Nhibernate caching

Cheers

Miau
+1  A: 

Most people are advising to use variations of the Entity-Attribute-Value (EAV) design. This design is overkill for your situation, and it introduces a whole bunch of problems, for example:

  • You can't define data type for an attribute; you can enter "banana" for an integer attribute
  • You can't declare an attribute as mandatory (i.e. NOT NULL in a conventional table)
  • You can't declare a foreign key constraint on an attribute

If you have a small number of categories, it's better to use solution A in Bogdan Maxim's answer. That is, define one table Products with attributes common to all categories, and one additional table for each category, to store the category-specific attributes.

Only if you have an infinite number of categories or if you must potentially support a different set of attributes per row in Products is EAV a good solution. But then you're not using a relational database at all, since EAV violates several rules of normalization.

If you really need that much flexibility, you'd be better off storing your data in XML. In fact, you might look into RDF and semantic web frameworks like Sesame.

Bill Karwin
Of course the EAV design is an overkill for the most of the situations. But when you do need more complexity it can be a time saver.
Bogdan Maxim
And of course, all the solutions for this problem break normalization
Bogdan Maxim
No, having a separate table per category does not affect normalization.
Bill Karwin