views:

85

answers:

3

Hi All.

I'm designing an application that will use Oracle, and we have this department hierarchy that we need to map in our database. Some thing that looks like this (I'm pretty sure you all know what I'm talking about but I'll include a piece of the ERD just in case):

alt text

So it will have data stored that looks like this:

[1 | 0]
[2 | 1]
[3 | 2]
[4 | 2]

In other words:

Department 1
     |__Department 2
             |___Department 3
             |___Department 4

And so on...

This will improve the number of records required on the table and the Data can be accessed using a CONNECT BY command, just having 1 recor per department. We usually go for this tree structure as solution, but in this new application performance is a critical, so I was wondering what if I have a flattened-out table that would look like this.

[1 | 0]
[2 | 1]
[3 | 1]
[3 | 2]
[4 | 1]
[4 | 2]

This allows you to have very obvious relationships without having to know the parent Department for a given child to know who their upper hierarchy Departments are. But this increases the amount of data required since you need a record for each level a Department is in, meaning that if a have a Department 15 levels below the top one we would require 15 records for it. The Department is pretty big so this may end up being a huge table (about 2 million records).

Ok, so after the brief introduction, this is the question; Has someone actually tried this that could tell me what is faster/less expensive for the DB between this two options, the huge flat table or the small tree one?

+6  A: 

I would definitely go for the first option (hierarchical approach). I think it's better to model the data correctly than to just use a bad data model to gain performance. Since you are modeling a hierarchy here, it makes sense to store it that way in the DB.

If you want the best of both worlds, my recommendation would be to look at using a materialized view to "flatten" the hierarchical data, then you are still storing the data properly, but you get the performance gains (if any) by using the materialized view.

There's almost always a way to follow a good data model and still find ways to get good performance. But a bad data model will cost you for years to come, and it takes great pain to correct it later.

However, even with the flattened approach, you have to consider that you are increasing the number of records dramatically, especially as you get to the leaf nodes in the tree, so I'd be surprised if having a flat hierarchy table (your second approach) would improve performance since there are many more records to process.

dcp
That's precisely the point of the question.., If you have to choose between doing simple-straightforward queries in a huge table, or doing connect by in a small table, and performance is your only concern... what would you choose and why? I mean, I totally agree with you regarding the good model vs performance.
Chepech
@Chepech - I'm not trying to dodge the question here, but performance would *never* be my only concern. I would strive to get the data model right first and foremost. I've worked with crappily designed DB's before (no referential integrity, data duplication everywhere, etc.) and it's a nightmare. With that said, I still think connect by would perform better in this case due to the fact that you get an exponential increase in records (at leaf level when storing flat hierarchy) compared to just storing the hierarchy properly.
dcp
@dcp - I wasn't implying that. I have had too the "opportunity" to work with DB models that look like if they came out of the rear end of some farm animal, so I know what you mean.
Chepech
A: 

If you need read performance, try path enumeration.

[1 | 0]
[2 | 1]
[3 | 2]
[4 | 2]

becomes

[1 | '0']
[2 | '0.1']
[3 | '0.1.2']
[4 | '0.1.2']

So you can select ALL children of 2 by doing

SELECT * FROM dept WHERE path LIKE '0.1.2%'

Of course this is a compromise between normalization and performance.

Novikov
That's a pretty good idea, but the thing is, what if I want to modify the department structure? ... all hell breaks loose. Also this would require String comparisons and some sort of parsing that would in the end it would probably render things slower.
Chepech
Yes, the read performance is great, the write performance is terrible, so this is a solution only for write once read many type tables. With hierarchical data in a relational database there's no standard optimal solution, everything is a compromise.
Novikov
A: 

With something like Departments, it is not possible to have enough records in the table to where performance would be an issue. Don't even worry about that.

Even with some other type of heirarchical data, where there might be so many records that performance could be affected, there are always other technologies/approaches to address those performance issues (when they pop up), and the cost of implementing these other solutions is almost always less than the increase in Development and maintenance effort that you would incur from attempting to code your system against a flat schema.

Charles Bretana