views:

370

answers:

4

Short question: How should product categories that appear under multiple categories be managed? Is it a bad practice to do so at all?

Background info: We have a product database with categories likes this:

Products

  -Arts and Crafts Supplies
    -Glue
    -Paper Clips
    -Construction Paper


  -Office Supplies
    -Glue
    -Paper Clips

Note that glue and paper clips are assigned to both categories. And although they appear in two different spots in this category tree, they have the same category ID in the database. Why? Two reasons:

  1. Categories are assigned attributes - for example, a paper clip could have a weight, a material, a color, etc.
  2. Products assigned to the glue category are displayed under arts and crafts and Office Supplies. Which is to be expected - they're the same actual category ID in the database.

This allows us to manage a single category and it's attributes and assigned products, but place it at multiple places within the category tree.

We are using the nested set model, so the db structure we use to support this is:

Category
----------
CategoryID
CategoryName


CategoryTree
------------
CategoryTreeID
CategoryID
Lft
Rgt

So there's a 1:M between Category and CategoryTree because there can be multiple instances of a given category within the category tree.

Is there a simpler way to model this that would allow a product category to display under multiple categories?

+1  A: 

It may well be necessary for a category to have multiple parents. However, no matter what parent you found a category under, its subcategories should remain the same.

I've seen real systems that implemented precisely this logic and worked fine.

edit

To answer your question, I don't think the model I'm suggesting is as restrictive as you imagine. Basically, a given branch of the tree may be found under more than one parent branch, but wherever it is found, it has the same children. Nothing about this prevents you from cherry-picking some children of one branch and also making them children of another.

So, for example, you could include the glues category under both office supplies and hobby supplies, and if you added "Crazy Glue (Suppository Edition)" under glues, it would show up in both. If you have items that might be grouped together logically but need to be separated by their use, you can still do that. You might put mucilage and paste under the category of hobby adhesives, which goes under the hobby root, but not under the office root. Or you could do that and simultaneously have a combined category that's used internally by your buyers. What you can't do is forget to include that new type of glue in all of the relevant categories once you've added it wherever it belongs in your business model ontology.

In short, you lose very little with this restriction, but gain a bit of structure to help avoid the problem of having to manage each item individually.

edit

Assuming I've made a convincing case for the model itself, there's still the issue of implementation. There are lots of options, but here's one way to go:

There is a CatalogItem table containing a synthetic primary key, the label, optional description/detail text, and an optional SKU (or equivalent). You then have a many-to-many CatalogItemJoin with child and parent ID's, both sides constrained to CatalogItemTable.

An item that appears as a parent is a category, so it should not have a SKU. An item that appears only as a child is a product, so it should have a SKU. It's fine for any item to have more than one parent; that just means that it's in multiple categories. Likewise, there's no problem with multiple children per parent; that would be the typical case of a category with a few products in it. However, given a category's ID, its children will be the same regardless of what parent category led you there. The other constraint is that you'll want to avoid loops.

Steven Sudit
Thanks Steven - Can you clarify why subcategories should remain the same? In my example the structure doesn't require it. And it wouldn't make sense to put construction paper under office supplies. So the solution would be to assign products to a spot in the category tree rather than the category ID, which makes management more work because I'd have to assign a product to both instances of the glue category in the tree, but does provide more flexibility.
Cory House
Cory, I added a section to try to answer your question. Does it?
Steven Sudit
+2  A: 

The most famous example of this is Google Mail, where the classification is done this way. Google is famous for the usability of their products ...

I believe other words are preferable to the "parent" word, that actually suggest only XToOne relationship...

Maybe you could say that a Product as many Categories, so the relationship would be ManyToMany. And only the display would starts with Categories to reach the Products...


This would highlight a problem : if you don't limit the number of categories, and you display the categories with sub-categories and so on, you could end up with:

  • a huge categories and product list, with many many duplications
  • a big depth (probably unreadable)

The interesting part is highlighting the problem, then to imagine a solution that is fine for the end-user.

KLE
+3  A: 

I don't see anything wrong with this as long as it is true that all Glue is appropriate for both Office Supplies and craft supplies.

Jacob G
What if it's not? Like a kids glue stick. It's only arts and crafts. So you're really stuck creating a separate category, agreed? Any suggestion? I'm having a hard time picturing a situation where every product in the category would be apply in both spots. I'm not sure one even exists.
Cory House
With the model you have, then yes you'd need to create a separate category. Your two options, I suppose, are to either separate the display categorization from the taxonmic categorization (with another attribute) or to have your category trees be distinct paths from root to leaf and then associate the product to those trees. (I haven't totally thought through that though to know how scalable it is)
Jacob G
Jacob, I should have read your answer before replying to Cory, as you cover much of the same ground, and often more economically.
Steven Sudit
+2  A: 

What you have is a good way, though why not simplify the 2nd table like so:

Category

ID Name

SubCategory

ID CategoryID SubCategoryID

Though for the future I would beware of sharing child categories between the two root categories. Sometimes it is better to create a unique categorization of products for consistency, which is easier to manage for you and potentially easier to navigate for the customer. Otherwise, you have the issue that if you're on the Glue page coming from office supplies, then do you show the other path as well? If not, you will have two identical pages, except for the path, which is an issue for SEO. If you do, then the user may get confused.

eulerfx
Excellent point eulerfx. We've had just that problem, which is one of the many reasons I'm asking this.
Cory House
@eulerfx, could you please say more about how having two pages with identical contents except for a different parent in the hierarchy could cause SEO problems?
Steven Sudit
I just noticed your subcategory table suggestion is describing the adjacency model. While I agree it's a simpler struture to understand initially and makes inserts and updates easy, the performance of the nested set model is far superior on selects - which on a product category structure, is 99%+ of our traffic. Plus, querying a nested set is far simpler. The adjacency model requires a join for every level in the tree which gets unwieldy on large trees.
Cory House
MS SQL Server has CTE, which allows efficient recursion, and I know Oracle has something at least as powerful, so don't write off a good solution based on possible performance problems. I would also add that displaying a catalog is a prototypical case where caching the web pages themselves can be necessary on a high load site.
Steven Sudit
Good points Steven, though the app in question is on MySQL and I'm aware of no equivalent recursive functionality. And honestly, it's the simplicity and scalability of querying a nested set that I find most attractive over the adjacency model.
Cory House
Cory, I hadn't realized that MySQL had a feature gap here, but take a look at this: http://jgeewax.wordpress.com/2006/07/17/hierarchical-data-in-a-relational-database/
Steven Sudit
It could be an SEO issue because, for instance, googlebot will discover this similar content on both pages and will likely decrease the ranking of those pages. It says so in their docs online. Also, if you have a webmasters account on google, it will show messages when googlebot discovers similar content.
eulerfx
I would think that you'd lose ranking on each of the duplicates, but I don't know that you'd lose any overall. There's no reason two copies should be allowed to count for more than one.
Steven Sudit