views:

274

answers:

5

I'm looking at building a facility which allows querying for data with hierarchical filtering. I have a few ideas how I'm going to go about it but was wondering if there are any recommendations or suggestions that might be more efficient.

As an example imagine that a user is searching for a job. The job areas would be as follows.

1: Scotland
2: --- West Central
3: ------ Glasgow
4: ------ Etc
5: --- North East
6: ------ Ayrshire
7: ------ Etc

A user can search specific (i.e. Glasgow) or in a larger area (i.e. Scotland).

The two approaches I am considering are:

  1. keep a note of children in the database for each record (i.e. cat 1 would have 2, 3, 4 in its children field) and query against that record with a SELECT * FROM Jobs WHERE Category IN Areas.childrenField.
  2. Use a recursive function to find all results who have a relation to the selected area.

The problems I see from both are:

  1. Holding this data in the db will mean having to keep track of all changes to structure.
  2. Recursion is slow and inefficent.

Any ideas, suggestion or recommendations on the best approach? I'm using C# ASP.NET with MSSQL 2005 DB.

+1  A: 

You should use nested sets. Here's an implementation in MySQL. http://dev.mysql.com/tech-resources/articles/hierarchical-data.html

Martin
He's using MSSQL, not MySQL.
FrustratedWithFormsDesigner
This answer is about "nested sets" as a technique for dealing with hierarchical data. He just linked to a good explanation of it that happens to be on a MySQL site. There is nothing involved in it that is MySQL-specific.
patmortech
+2  A: 

You can use Common Table Expressions to do recursive queries. I find this technique very powerful, easy to read and easy to maintain.

Jelle
+2  A: 

Here is an approach i have seen used:

Create a varchar(max) field called hierarchyid. Generate base ids for all root objects. For each child object generate an id and prepend it with the parent(s) ids.

Example Table

ID(PK) HierarchyID Area
1       sl           Scotland 
2       slwc        West Central
3       slwcgg       Glasgow 

Example Query

SELECT * FROM Areas Where HierarchyID LIKE 'sl%'
unclepaul84
Interesting technique!
FrustratedWithFormsDesigner
In SQL Server 2008, they've introduced a data type, HierarchyID, to handle this approach: http://msdn.microsoft.com/en-us/magazine/cc794278.aspx
Tuzo
This approach looks like a slightly different implementation of my option 1 idea. Nice and simple, however it would mean every time an update to the category happens the HeiratchyID would need to be reassessed.
WDuffy
@Tuzo interesting link, thanks :)
WDuffy
@WDuffy yes, you would need to make sure your insert and updates calculate the hierarchyID properly. Instead of using codes like in the above example, I've used the IDs. e.g. instead of 'slwcgg', I would use '/1/2/3/'.
Tuzo
+1  A: 

How about this?

Table =>

Id ParentId Name

Nice simple table?

Then how about some nice complicated piece pf SQL to go with that? (CTEs rock I think)

public object FetchCategoryTree() { var sql = @"SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

                    WITH AreaTree (ID, Name, ParentID, OrgLevel, SortKey) AS
                     (
                      -- Create the anchor query. This establishes the starting
                      -- point
                      SELECT
                            a.ID, 
                            cast('---- ' + a.Name as varchar(255)),
                            a.ParentID, 
                            cast('----' as varchar(55)),
                            CAST(a.ID AS VARBINARY(900))                                
                       FROM dbo.Area a
                       WHERE a.ParentID is null
                      UNION ALL
                      -- Create the recursive query. This query will be executed
                      -- until it returns no more rows
                      SELECT
                            a.ID,       
                            cast('----' + b.OrgLevel + '  ' + a.Name as varchar(255)), 
                            a.ParentID,
                            cast(b.OrgLevel+ '----' as varchar(55)),                        
                            CAST(b.SortKey + CAST (a.ID AS BINARY(4)) AS VARBINARY(900))                                
                        FROM dbo.Area a
                                INNER JOIN AreaTree b ON a.ParentID = b.ID
                     )
                    SELECT * FROM AreaTree
                    ORDER BY SortKey";

        return FetchObject(sql);
    }

Now this does some SQL magic that am not too sure of. However in layman's terms it basically takes the first part as the root query. Then it goes back to the table and executes the second part using the first part's answer through a join, and continues doing to still it can't find any more matches, basically a big loop. It's pretty quick as well.

You will get the out a bunch of rows with a sort key attached. Once you order the query by the sort key you will get the answer like :

 ---- parent 1
 -------- child 1
 -------- child 2
 ------------ child 2.1
 ---- parent 2
 -------- etc

Might be what you are looking for?

Colin Wiseman
A: 

I use Joe Celko's tree model for the sales tax hierarchy (state/county/city/misc) in our application and it works well.

Your "find jobs at this area or below" query would look something like this:

SELECT * FROM Jobs WHERE Jobs.AreaID IN
(SELECT P1.AreaID
FROM Areas AS P1, Areas AS P2
WHERE P1.lft BETWEEN P2.lft AND P2.rgt
AND P2.Areas.AreaID = @selectedAreaID)

Celko Tree in SQL article