ansaurus

Question

Hierarchical Data Structure Design (Nested Sets)

Answer 1

A:

you might be able to solve the customer groups problem with roles and treeId's but you'll have to provide us with the query.

Mladen Prajdic 2008-12-10 11:24:57

Answer 2

A:

Might it be possible to calculate the ProductCount and SubSectionCount after the load each day?
If the data is changing only once a day surely it's worthwhile to calculate these figures then, even if some denormalization is required.

Bravax 2008-12-10 15:18:04

Yes, we already pre-calculate these daily. It's not so much the counting of the products which is the issue, it's showing the actual list of products in the selected section which is slow.

James 2008-12-10 15:23:26

Do you update statistics after reloading your data? If your indexes are fine (tuned for read only use), then could it be you're returning too much data? That's an area i might look at next.TBH, to help more is going to be quite difficult without seeing the schema, and/or stored procedures.

Bravax 2008-12-10 15:32:39

Answer 3

+2 A:

Use a closure table. If your basic structure is a parent-child with the fields ID and ParentID, then the structure for a closure table is ID and DescendantID. In other words, a closure table is an ancestor-descendant table, where each possible ancestor is associated with all descendants. You may include a LevelsBetween field if you need. Closure table implementations usually include self-referencing records, i.e. ID 1 is an ancestor of descendant ID 1 with LevelsBetween of zero.

Example: Parent/Child
ParentID - ID
1 - 2
1 - 3
3 - 4
3 - 5
4 - 6

Ancestor/Descendant
ID - DescendantID - LevelsBetween
1 - 1 - 0
1 - 2 - 1
1 - 3 - 1
1 - 4 - 2
1 - 6 - 3
2 - 2 - 0
3 - 3 - 0
3 - 4 - 1
3 - 5 - 1
3 - 6 - 2
4 - 4 - 0
4 - 6 - 1
5 - 5 - 0

The table is intended to eliminate recursive joins. You push the load of the recursive join into an ETL cycle that you do when you load the data once a day. That shifts it away from the query.

Also, it allows variable-level hierarchies. You won't be stuck at 4.

Finally, it allows you to slot products in non-leaf nodes. A lot of catalogs create "Miscellaneous" buckets at higher levels of the hierarchy to create a leaf-node to attach products to. You don't need to do that since intermediate nodes are included in the closure.

As far as indexing goes, I would do a clustered index on ID/DescendantID.

Now for your query performance. This takes a chunk out but not all. You mentioned a "Top 10". This implies ranking over a set of facts that you haven't mentioned. We need details to help tune those. Plus, this gets only gets the leaf-level sections, not the products. At the very least, you should have an index on your CatalogueProduct that orders by SectionID/ProductID. I would force Section to Product joins to be loop joins based on the cardinality you provided. A report on a catalog section would go to the closure table to get descendants (using a clustered index seek). That list of descendants would then be used to get products from CatalogueProduct using the index by looped index seeks. Then, with those products, you would get the facts necessary to do the ranking.

entaroadun 2008-12-10 16:55:27

Excellent, that's exactly what I needed and has really improved performance. Thanks

James 2008-12-11 11:28:18

ansaurus

tags:

views:

answers:

Hierarchical Data Structure Design (Nested Sets)

related questions