views:

2454

answers:

4

What would be the best way to implement a customizable (meaning, a tree structure with an unknown number of level) tree data structure in a database?

I've done this once before using a table with a foreign key to itself.

What other implementations could you see, and does this implementation make sense?

+5  A: 

Have a look at Managing Hierarchical Data in MySQL. It discusses two approaches for storing and managing hierarchical (tree-like) data in a relational database.

The first approach is the adjacency list model, which is what you essentially describe: having a foreign key that refers to the table itself. While this approach is simple, it can be very inefficient for certain queries, like building the whole tree.

The second approach discussed in the article is the nested set model. This approach is far more efficient and flexible. Refer to the article for detailed explanation and example queries.

Ayman Hourieh
+1  A: 

Having a table with a foreign key to itself does make sense to me.

You can then use a common table expression in SQL or the connect by prior statement in Oracle to build your tree.

Aaron Daniels
I have a log table, with a LogID identity column, and a ParentLogID column with a FK that points back to LogID column. When the first log row in a transaction is written, I grab SCOPE_IDENTITY(). All other log records get written with this value in the ParentLogID column. This is really useful for grouping rows that belong together. It is the only real way to see what happened, without this, it would be a huge mess of log rows from multiple transactions all mixed together.
KM
@KM - He said "does make sense" not "doesn't make sense"
John Rasch
+11  A: 

You mention the most commonly implemented, which is Adjacency List: http://www.sqlsummit.com/AdjacencyList.htm

There are other models as well, including materialized path and nested sets: http://communities.bmc.com/communities/docs/DOC-9902

Joe Celko has written a book on this subject, which is a good reference from a general SQL perspective (it is mentioned in the nested set article link above).

Also, Itzik Ben-Gann has a good overview of the most common options in his book "Inside Microsoft SQL Server 2005: T-SQL Querying".

The main things to consider when choosing a model are:

1) Frequency of structure change - how frequently does the actual structure of the tree change. Some models provide better structure update characteristics. It is important to separate structure changes from other data changes however. For example, you may want to model a company's organizational chart. Some people will model this as an adjacency list, using the employee ID to link an employee to their supervisor. This is usually a sub-optimal approach. An approach that often works better is to model the org structure separate from employees themselves, and maintain the employee as an attribute of the structure. This way, when an employee leaves the company, the organizational structure itself does not need to be changes, just the association with the employee that left.

2) Is the tree write-heavy or read-heavy - some structures work very well when reading the structure, but incur additional overhead when writing to the structure.

3) What types of information do you need to obtain from the structure - some structures excel at providing certain kinds of information about the structure. Examples include finding a node and all its children, finding a node and all its parents, finding the count of child nodes meeting certain conditions, etc. You need to know what information will be needed from the structure to determine the structure that will best fit your needs.

JeremyDWill
Hi, I am facing this exact same problem stated in the question and would like to ask you a question about the topics above. Considering a structure as in number one topic (organizational structured table (not employee structured) with ParentId referenced in the same table), I need to set who is the boss of a certain area. I will assign all the employees of that specific area directly to it. Where would you put the boss of that specific area? Inside the same area or one gorup above? My approach is to reference him/her to the group above, that gives me a better structure I think. Thanks.
Marcos Buarque
+1  A: 

I've used the following implementation on SQL SERVER 2005. Check here

emzero