views:

158

answers:

8

Hello all.

I have a tree structure that can be n-levels deep, without restriction. That means that each node can have another n nodes.

What is the best way to retrieve a tree like that without issuing thousands of queries to the database?

I looked at a few other models, like flat table model, Preorder Tree Traversal Algorithm, and so.

Do you guys have any tips or suggestions of how to implement a efficient tree model? My objective in the real end is to have one or two queries that would spit the whole tree for me.

With enough processing i can display the tree in dot net, but that would be in client machine, so, not much of a big deal.

Oh just dropping some more info here:

environment: Oracle or PostgreSQL, C# and Winforms.

Thanks for the attention

+2  A: 

There is no efficient tree model.

SQL 2008 - hierarchyid. There is a new data type for handling hiearchies, but it gets large over time.

Or: usual. Parent field in table, pointing to the ID of the parent table.

For queries...

  • You can do that a little more optimal with better SQL (one statement PER LEVEL, plus use of a temp table). This is the really tricky part here.
  • What we did in a CMS where we had the same was that every node knew its parent, the top node (for multiple graphs) and the next higher container node. Some nodes were marked as containers - they did contain sub-items, but these belonged logically to anothe containe (like: website, with folders, then document with resources like images).
TomTom
A: 

Assuming your only concern is selects and not inserts/updates/deletes, this depends on needs, such as:

  • do you need to know what level each node is at?
  • do you need to know if each node has any children while rendering it?
  • do you need to sort siblings by name?

However, if there really are minimal changes to the tree being done, then it almost doesn't matter what scheme you use, as you can do all the work in the application layer and cache the output.

Edit:

When order matters, I usually go for the materialized path method, and add an additional column SortPath. This way you can get your results sorted by sibling, which is a denormalization that makes rendering the tree in HTML extremely easy, as you can write out the entire tree (or any portion) in exactly the order you receive the records using a single query. This is optimal for speed, and is the easiest way to sort more than one level at a time.

E.g.,

CREATE TABLE [dbo].[MatPath](
    [ID] [int] NULL,
    [Name] [varchar](50) NULL,
    [Path] [varchar](max) NULL,
    [SortPath] [varchar](max) NULL
) 

insert into MatPath (ID, Name, Path, SortPath) values (1, 'Animal', '1', 'Animal-1')
insert into MatPath (ID, Name, Path, SortPath) values (2, 'Dog', '1.2', 'Animal-1|Dog-2')
insert into MatPath (ID, Name, Path, SortPath) values (3, 'Horse', '1.3', 'Animal-1|Horse-3')
insert into MatPath (ID, Name, Path, SortPath) values (4, 'Beagle', '1.2.4', 'Animal-1|Dog-2|Beagle-4')
insert into MatPath (ID, Name, Path, SortPath) values (5, 'Abyssinian', '1.3.5', 'Animal-1|Horse-3|Abyssinian-5')
insert into MatPath (ID, Name, Path, SortPath) values (6, 'Collie', '1.2.6', 'Animal-1|Dog-2|Collie-6')

select * 
from MatPath
order by SortPath

Output:

ID     Name            Path        SortPath
------ --------------- ----------- --------------------------------
1      Animal          1           Animal-1
2      Dog             1.2         Animal-1|Dog-2
4      Beagle          1.2.4       Animal-1|Dog-2|Beagle-4
6      Collie          1.2.6       Animal-1|Dog-2|Collie-6
3      Horse           1.3         Animal-1|Horse-3
5      Abyssinian      1.3.5       Animal-1|Horse-3|Abyssinian-5

(6 row(s) affected)

You can determine the level of each node by counting pipes (or periods) in the application layer, or in SQL using whatever built-int or custom functions that can count occurrences of a string.

Also, you'll notice I append the ID to the Name when building SortPath. This is to ensure that two sibling nodes with the same name will always get returned in the same order.

RedFilter
hello orbman. i do need to know the level of each :( and i need to sort them by name. i dont care about rendering order. I will build the tree in memory and then ask to be rendered. :D
George
A: 

Nested Sets - slow for updating but fast for querying.

AakashM
+1  A: 

We used a model with great success where we for each node stored a string containing each node id from top to that node, separated by '.'. Retrieving a tree becomes super efficient, only sort on the string.

This model have a limitation of course as to how deep the hierarchy can go. But this is primarily limited by the max size of the id string column. In SQL Server using the varchar(max) the limit is 2^31-1 bytes which makes for pretty deep hierarchies.

Peter Lillevold
interesting i will look into it.
George
A: 

You could number the nodes from 1..M as you build the tree, and store children references in terms of these indexes, Now just write the tree. You can retrieve all the nodes in a a single read. The numbering scheme contains the tree information,

Ira Baxter
+1  A: 

To me this depends on the size of your tree and the type of a query you want to run.

From what I can tell you have two data items. MyId and MyParent. If you created a simply table with just those two things you could ask and answer who are my children and who are my parents.

If you wanted to build a complete tree for intensive analysis, I would query all data and them build the tree myself. The database acting only as information storage at this point.

If my tree was large or took more than a half a second to create I would maintain it on my server and make it accessible with ajax calls to the client. This works best if the tree is static for all clients.

If the tree is dynamic I would insist that the client waits while the data is retrieved from the server and the tree is built locally. This will increase usage speed.

If you provided more information on what you ment when you said "split the tree" I might be able to offer a better opinion. But in general I have not found an ideal tree in a database. However OLAP might have such a tool that I don't know about.

madmik3
hello madmik3. i dont need to split the tree, just "spit" to the user.i'm trying a sort of flat table tree model, where you store level and rank and i'll see if from there i can build the tree as i want.thanks for the help. any other suggestions welcome!
George
A: 

Joe Celko has a solution for this in his SQL for Smarties book (chapter 29). Inserts, updates, and deletes are a little complicated, but selects are very fast. I have been using it for a few years now and it works very well.

Ray
I will look into that chapter. Thanks!
George
+2  A: 

I noticed that you listed your DBs as Oracle or PostgreSQL. If you can stick to Oracle, you can use the start with and connect by commands to easily do what you need. There are plenty of options that will also gracefully handle if there are loops, or if you need to filter out a branch based on some criteria. I've personally used this in a production system that had both heavy reads and writes without any performance issues.

A quick search shows that you can do something similar (although more complex sql wise) with postgreSQL using the with recursive command. I've not personally used this method, so I can't give you any more information then that.

Chad_K
Hello Chad_K. I have considered using Oracle´s and Postgre´s recursive functions, but since they are not standard i prefer not to use them. thanks for the reminder thou :P
George
@George: Oracle 11g supports ANSI standard recursion using the WITH and CYCLE keywords. Start here: http://download.oracle.com/docs/cd/E11882_01/server.112/e10881/chapter1.htm#insertedID3
Adam Musch