Hi,
I'm working on a design for a hierarchical database structure which models a catalogue containing products (this is similar to this question). The database platform is SQL Server 2005 and the catalogue is quite large (750,000 products, 8,500 catalogue sections over 4 levels) but is relatively static (reloaded once a day) and so we are only concerned about READ performance.
The general structure of the catalogue hierarchy is:-
- Level 1 Section
- Level 2 Section
- Level 3 Section
- Level 4 Section (products are linked to here)
- Level 3 Section
- Level 2 Section
We are using the Nested Sets pattern for storing the hierarchy levels and storing the products which exist at that level in a separate linked table. So the simplified database structure would be
CREATE TABLE CatalogueSection
(
SectionID INTEGER,
ParentID INTEGER,
LeftExtent INTEGER,
RightExtent INTEGER
)
CREATE TABLE CatalogueProduct
(
ProductID INTEGER,
SectionID INTEGER
)
We do have an added complication in that we have about 1000 separate customer groups which may or may not see all products in the catalogue. Because of this we need to maintain a separate "copy" of the catalogue hierarchy for each customer group so that when they browse the catalogue, they only see their products and they also don't see any sections which are empty.
To facilitate this we maintain a table of the number of products at each level of the hierarchy "rolled up" from the section below. So, even though products are only directly linked to the lowest level of the hierarchy, they are counted all the way up the tree. The structure of this table is
CREATE TABLE CatalogueSectionCount
(
SectionID INTEGER,
CustomerGroupID INTEGER,
SubSectionCount INTEGER,
ProductCount INTEGER
)
So, onto the problem Performance is very poor at the top levels of the hierarchy. The general query to show the "top 10" products in the selected catalogue section (and all child sections) is taking somewhere in the region of 1 minute to complete. At lower sections in the hierarchy it is faster but still not good enough.
I've put indexes (including covering indexes where applicable) on all key tables, run it through the query analyzer, index tuning wizard etc but still cannot get it to perform fast enough.
I'm wondering whether the design is fundamentally flawed or whether it's because we have such a large dataset? We have a reasonable development server (3.8GHZ Xeon, 4GB RAM) but it's just not working :)
Thanks for any help
James