ansaurus

Question

What is the most efficient way to store and query trees?

Answer 1

A:

You might want to checkout the HIERARCHYID datatype in SQL Server 2008 or its equivalent in Oracle.

User 2009-05-07 20:25:30

Answer 2

+2 A:

Before investing too much time into designing a hierarchical data structure on top of a relational database, consider reading "Naive Trees" section (starting at slide 48) in the excellent presentation SQL Anti-Patterns Strike Back by Bill Karwin. Bill outlines the following methods for developing a hierarchy:

Path enumeration (slide 55)
Nested sets (slide 58)
Closure table (slide 68)

jakemcgraw 2009-05-07 20:35:02

Good presentation, thanks!

Rob 2009-05-11 15:03:56

Answer 3

+2 A:

Trees are generally not very efficient in databases. I mean: if you'd design the tree to be truly recursive, with items pointing to their parents, you'll get lots of queries to find all sub-nodes.

But you can optimize the tree, according to your needs.

Put any part of the url into a column is not a bad idea. You need to limit the depth to a certain number of sub-nodes. You could have indexes on any column, which makes it very fast.

Queries on such a structure are very simple:

Select count(*) From Hits where node1 = 'ocp' AND node2 = 'security';

Make a access statistic:

SELECT node1, node2, count(*) as "number of hits"
FROM hits 
GROUP BY node1, node2
ORDER BY count(*) DESC

you'll get

node1            node2        number of hits
'ocp'                        23345
'ocp'            'security'   1020
'ocp'            'food'        234
'weyland-yutani' 'products'     22

You could also store the url as it is and filter using regex. This is more flexible, but slower, because you don't have indexes. You need only to limit the whole length of the url, not the number of sub-nodes.

I think you could do this with any database good enough to store large amount of data. For instance MySql.

Stefan Steinegger 2009-05-07 20:39:09

For storing trees into a database you might want to look into the Nested Set model.

Jasper Bekkers 2009-05-08 15:03:43

Answer 4

+1 A:

The book, The Art of Sql, by Stephane Faroult has a very excellent chapter (7 - Dealing with Hierarchical Data) which explains and compares 3 methods for storing and querying trees using relational databases.

If you are doing a serious, industrial-strength implementation, studying the chapter will be time well spent.

wire science 2009-05-08 09:02:27

Answer 5

+1 A:

I think the most efficient way to store this type of data is in a parts explosion (or hierarchy) table.

A parts explosion table consists of three columns: an identity, a parent, and a description. For the example data, the table would look something like this:

Identity Parent Description
0        Null   ocp
1        0      about_us.html
2        0      security
3        2      ed-209
4        3      patches
5        4      urgent.html
6        2      rc
7        0      food
8        Null   weyland-yutani
9        8      products

As the URL (explosion) table is being populated, populate a table that records the leaf of each URL. From the example data:

 Leaf ID
-------
1
5
6
7
9

I believe you can answer all your questions starting with these two tables.

2009-05-08 14:59:46

ansaurus

tags:

views:

answers:

What is the most efficient way to store and query trees?

related questions