views:

547

answers:

4

Hello all,

I feel that this is likely a common problem, but from my google searching I can't find a solution quite as specific to my problem.

I have a list of Organizations (table) in my database and I need to be able to run queries based on their hierarchy. For example, if you query the highest Organization, I would want to return the Id's of all the Organizations listed under that Organization. Further, if I query an organization sort of mid-range, I want only the Organization Id's listed under that Organization.

What is the best way to a) set up the database schema and b) query? I want to only have to send the topmost Organization Id and then get the Id's under that Organization.

I think that makes sense, but I can clarify if necessary.

Much appreciated,

-Matt

A: 

You could have an Organization have an id PK and a parent FK reference to the id. Then for the query, use (if your database backend supports them) recursive queries, aka Common Table Expressions.

Kev
+1  A: 

The easy way is to have a ParentID column, which is a foreign key to the ID column in the same table, NULL for root nodes. But this method has some drawbacks.

Nested sets are an efficient way to store trees in an relational database.

Tmdean
I just noticed when I read your comment on my answer that you already had a link in your answer to something nearly identical to my answer... if I'd realized that before I posted, I would have been pushing your answer as the correct one.
rmeador
+2  A: 

One simple way is to store the organization's parentage in a text field, like:

SALES-EUROPE-NORTH

To search for every sales organization, you can query on SALES-%. For each European sales org, query on SALES-EUROPE-%.

If you rename an organization, take care to update its child organizations as well.

This keeps it simple, without recursion, at the cost of some flexibility.

Andomar
As far as I know this is the only solution where you can get results in a fixed number of queries (hopefully - 1). All the other methods of storing a parent_id field will cause many queries and a potential performance issue. Notice that by using this as the sole id, you get log ids which are not optimal for databases, so you may want to have such keys as a secondary key
David Rabinowitz
Perfect! Thanks!
mattdell
this is a TERRIBLE solution and it will come back to bite you really badly when the hierarchy changes. It's something my company has been slowly refactoring out of our system for years... so painful. I'll try to dig up a better approach that I saw once.
rmeador
Even with a trigger that updates the heirarchy on insert or update?
mattdell
+3  A: 

As promised in my comment, I dug up an article on how to store hierarchies in a database that allows constant-time retrieval of arbitrary subtrees. I think it will suit your needs much better than the answer currently marked as accepted, both in ease of use and speed of access. I could swear I saw this same concept on wikipedia originally, but I can't find it now. It's apparently called a "modified preorder tree traversal". The gist of it is you number each node in the tree twice, while doing a depth-first traversal, once on the way down, and once on the way back up (i.e. when you're unrolling the stack, in a recursive implementation). This means that the children of a given node have all their numbers in between the two numbers of that node. Throw an index on those columns and you've got really fast lookups. I'm sure that's a terrible explanation, so read the article, which goes into more depth and includes pictures.

rmeador
very cool solution.
yetanotherdave
The article describes a custom balanced tree. This is a TERRIBLE solution and it will come back to bite you when the hierarchy changes. ;-)
Andomar
@Andromar: alright, I deserved that for the slight hyperbole in my comment on your answer... :) the main difference here is that refreshing the hierarchy is a very fast and straightforward operation when it changes.
rmeador
A balanced tree is pretty complex to change? You'd need to update the ID associated with every org. If the ID is the primary key, that's even harder.
Andomar
What? Why would you need to update the ID? You just need to recompute the left and right columns. You might have a universal Turing machine on your desk that can assist you with that task.
Tmdean