views:

1042

answers:

9

I have a products table that contains a FK for a category, the Categories table is created in a way that each category can have a parent category, example:

Computers
    Processors
        Intel
            Pentium
            Core 2 Duo
        AMD
            Athlon

I need to make a select query that if the selected category is Processors, it will return products that is in Intel, Pentium, Core 2 Duo, Amd, etc...

I thought about creating some sort of "cache" that will store all the categories in the hierarchy for every category in the db and include the "IN" in the where clause. Is this the best solution?

A: 

I have done similar things in the past, first querying for the category ids, then querying for the products "IN" those categories. Getting the categories is the hard bit, and you have a few options:

  • If the level of nesting of categories is known or you can find an upper bound: Build a horrible-looking SELECT with lots of JOINs. This is fast, but ugly and you need to set a limit on the levels of the hierarchy.
  • If you have a relatively small number of total categories, query them all (just ids, parents), collect the ids of the ones you care about, and do a SELECT....IN for the products. This was the appropriate option for me.
  • Query up/down the hierarchy using a series of SELECTs. Simple, but relatively slow.
  • I believe recent versions of SQLServer have some support for recursive queries, but haven't used them myself.

Stored procedures can help if you don't want to do this app-side.

Draemon
A: 

What you want to find is the transitive closure of the category "parent" relation. I suppose there's no limitation to the category hierarchy depth, so you can't formulate a single SQL query which finds all categories. What I would do (in pseudocode) is this:

categoriesSet = empty set
while new.size > 0:
  new = select * from categories where parent in categoriesSet
  categoriesSet = categoriesSet+new

So just keep on querying for children until no more are found. This behaves well in terms of speed unless you have a degenerated hierarchy (say, 1000 categories, each a child of another), or a large number of total categories. In the second case, you could always work with temporary tables to keep data transfer between your app and the database small.

Simon
A: 

Maybe something like:

select *
from products
where products.category_id IN
  (select c2.category_id 
   from categories c1 inner join categories c2 on c1.category_id = c2.parent_id
   where c1.category = 'Processors'
   group by c2.category_id)

[EDIT] If the category depth is greater than one this would form your innermost query. I suspect that you could design a stored procedure that would drill down in the table until the ids returned by the inner query did not have children -- probably better to have an attribute that marks a category as a terminal node in the hierarchy -- then perform the outer query on those ids.

tvanfosson
right, but there is no limit in the category depth in my db schema... maybe the "cache" solution is the best option for now.. thanks anyway!
Bruno
A: 
CREATE TABLE #categories (id INT NOT NULL, parentId INT, [name] NVARCHAR(100))
INSERT INTO #categories
    SELECT 1, NULL, 'Computers'
    UNION
SELECT 2, 1, 'Processors'
    UNION
SELECT 3, 2, 'Intel'
    UNION
SELECT 4, 2, 'AMD'
    UNION
SELECT 5, 3, 'Pentium'
    UNION
SELECT 6, 3, 'Core 2 Duo'
    UNION
SELECT 7, 4, 'Athlon'
SELECT * 
    FROM #categories
DECLARE @id INT
    SET @id = 2
            ; WITH r(id, parentid, [name]) AS (
    SELECT id, parentid, [name] 
        FROM #categories c 
        WHERE id = @id
        UNION ALL
    SELECT c.id, c.parentid, c.[name] 
        FROM #categories c  JOIN r ON c.parentid=r.id
    )
SELECT * 
    FROM products 
    WHERE p.productd IN
(SELECT id 
    FROM r)
DROP TABLE #categories

The last part of the example isn't actually working if you're running it straight like this. Just remove the select from the products and substitute with a simple SELECT * FROM r

Jonas Lincoln
+2  A: 

Looks like a job for a Common Table Expression.. something along the lines of:

with catCTE (catid, parentid)
as
(
select cat.catid, cat.catparentid from cat where cat.name = 'Processors'
UNION ALL
select cat.catid, cat.catparentid from cat inner join catCTE on cat.catparentid=catcte.catid
)
select distinct * from catCTE

That should select the category whose name is 'Processors' and any of it's descendents, should be able to use that in an IN clause to pull back the products.

Steven Robbins
+2  A: 

The best solution for this is at the database design stage. Your categories table needs to be a Nested Set. This article is not mysql specific (despite the url), and gives a great overview of the different methods of storing a hierarchy in a database table.

Executive Summary:

Nested Sets

  • Selects are easy for any depth
  • Inserts and deletes are hard

Standard parent_id based hierarchy

  • Selects are based on inner joins (so get hairy fast)
  • Inserts and deletes are easy

So based on your example, if your hierarchy table was a nested set your query would look something like this:

SELECT * FROM products 
   INNER JOIN categories ON categories.id = products.category_id 
WHERE categories.lft > 2 and categories.rgt < 11

the 2 and 11 are the lft and right respectively of the Processors record.

MDCore
Excellent summary description of +'s and -'s of each approach.
Cory House
Thank you for this. I'd never have come up with the Nested Sets idea. It's so crazy it just might work!
Steve
A: 

This should recurse down all the 'child' catagories starting from a given catagory.

DECLARE @startingCatagoryId int
DECLARE @current int
SET @startingCatagoryId = 13813 -- or whatever the CatagoryId is for 'Processors'

CREATE TABLE #CatagoriesToFindChildrenFor
(CatagoryId int)

CREATE TABLE #CatagoryTree
(CatagoryId int)

INSERT INTO #CatagoriesToFindChildrenFor VALUES (@startingCatagoryId)

WHILE (SELECT count(*) FROM #CatagoriesToFindChildrenFor) > 0
BEGIN
    SET @current = (SELECT TOP 1 * FROM #CatagoriesToFindChildrenFor)

    INSERT INTO #CatagoriesToFindChildrenFor
    SELECT ID FROM Catagory WHERE ParentCatagoryId = @current AND Deleted = 0

    INSERT INTO #CatagoryTree VALUES (@current)
    DELETE #CatagoriesToFindChildrenFor WHERE CatagoryId = @current
END

SELECT * FROM #CatagoryTree ORDER BY CatagoryId

DROP TABLE #CatagoriesToFindChildrenFor
DROP TABLE #CatagoryTree
A: 

i like to use a stack temp table for hierarchal data. here's a rough example -

-- create a categories table and fill it with 10 rows (with random parentIds)
CREATE TABLE Categories ( Id uniqueidentifier, ParentId uniqueidentifier )
GO

INSERT
INTO   Categories
SELECT NEWID(),
       NULL 
GO

INSERT
INTO   Categories
SELECT   TOP(1)NEWID(),
         Id
FROM     Categories
ORDER BY Id
GO 9


DECLARE  @lvl INT,            -- holds onto the level as we move throught the hierarchy
         @Id Uniqueidentifier -- the id of the current item in the stack

SET @lvl = 1

CREATE TABLE #stack (item UNIQUEIDENTIFIER, [lvl] INT)
-- we fill fill this table with the ids we want
CREATE TABLE #tmpCategories (Id UNIQUEIDENTIFIER)

-- for this example we’ll just select all the ids 
-- if we want all the children of a specific parent we would include it’s id in
-- this where clause
INSERT INTO #stack SELECT Id, @lvl FROM Categories WHERE ParentId IS NULL

WHILE @lvl > 0
BEGIN -- begin 1

      IF EXISTS ( SELECT * FROM #stack WHERE lvl = @lvl )
      BEGIN -- begin 2

      SELECT @Id = [item]
      FROM #stack
      WHERE lvl = @lvl

      INSERT INTO #tmpCategories
      SELECT @Id

      DELETE FROM #stack
      WHERE lvl = @lvl
      AND item = @Id

      INSERT INTO #stack
      SELECT Id, @lvl + 1
      FROM   Categories
      WHERE  ParentId = @Id

      IF @@ROWCOUNT > 0
      BEGIN -- begin 3
         SELECT @lvl = @lvl + 1
      END -- end 3
   END -- end 2
   ELSE
   SELECT @lvl = @lvl - 1

END -- end 1

DROP TABLE #stack

SELECT * FROM #tmpCategories
DROP TABLE #tmpCategories
DROP TABLE Categories

there is a good explanation here link text

cheeves
A: 

My answer to another question from a couple days ago applies here... recursion in SQL

There are some methods in the book which I've linked which should cover your situation nicely.

Tom H.