views:

586

answers:

4

What in everyone's opinion is the best representation for a time-bound hierarchy in SQL?

What I mean by this is:
- On any given date you have a normal tree hierarchy
- This hierarchy can change from day to date
- Each child still only has one parent on any given date

Day 1...

Business
 |
 |-Joe
 |  |-Happy
 |  |-Sneezy
 |  |-Doc(*)
 |
 |-Moe
    |-Bashfull
    |-Sleepy

Day 2...

Business
 |
 |-Joe
 |  |-Happy
 |  |-Sneezy
 |
 |-Moe
    |-Doc(*)
    |-Bashfull
    |-Sleepy

At any time, a child can join the hierarchy for the first time, or leave the hierarchy completely. (For example, new employees, and retired employees.)

The main considerations:

  • Updating the hierarchy
  • Viewing the whole hierarchy across a date range
  • Reporting on whole sub-trees within the hierarchy
  • Reporting on whole sub-trees across a date range

I know how I do it at present, but am intrigued as to how other people may do it :)

EDIT

I naively assumed a few considerations so will be more explicit...

  • Each 'team' or 'person' will have a unique ID in a dimension table elsewhere
  • Other fact tables will use those IDs (storing performance metrics, for example)
  • The structure needs to facilitate historical reporting across date ranges
  • Use of ETL or triggers to maintain alternative structures Is an option

The generic nature is most important (forming just one part of a generic relational mode), combined with ease of use for driving report (for any part of the tree across any range of dates) and the ability to be updated reliably.

A: 

A couple of flat tables can work here. For each row, we need columns ID, Name, ParentID, and InactivatedDatetime (which defaults to null). Set the datetime for the old Doc belonging to Joe indicating that that record is no longer valid and move it off to an archive table (for cleanliness), and then create a new row (a near copy of the original row) for a new Doc with Moe's ID as the ParentID. The drawback with this approach is that the person being moved must get a new ID, which may not be convenient.

Christopher Morley
The new IDs would indeed be an issue. These IDs would key to the dimension table describing the people/teams and also the fact tables with information such as performance metrics, etc. Equally, removing and archiving records would impact historical reporting. If I asked for a report on "last month" and some records had been archived, I would need to union them back in before running the query...
Dems
A: 

I can think of a couple of reasonable solutions, depending on how your data is being used and how it changes.

1) Assuming today's hierarchy is the most important. I'd store today's hierarchy with a conventional ParentId column in each record. For previous versions of the hierarchy I'd have a history table of

ItemId, ParentId, ValidFromDate, ValidToDate

Any time the hierarchy changes, you add a new row to the history table.

2) If any/all of the hierarchies are of equal importance, I'd store a base line hierarchy and then implement a hierarchy transaction table.

TransactionId, ItemId, Action (Move/Delete/Add), DateTime, OldParentId, NewParentId
MrTelly
One can not assume today's hierarchy is most important, doing so impacts on the ability to "Reporting on whole sub-trees across a date range"
Dems
Would it not be the case that the transaction table would make reporting across date ranges "difficult"? If I wanted a report for "last month", how would it be done efficiently?
Dems
+3  A: 

There are several different books of relevance here - one set is for 'temporal databases', and the other for 'hierarchical structures in RDBMS'.

The tricky parts of your question, it seems to me, are:

  • Viewing the whole hierarchy across a date range

  • Reporting on whole sub-trees across a date range

The other items are, if not straight-forward, then manageable using the techniques outlined in the books, and along the lines suggested in other answers. Part of the problem is understanding what those two bullet points mean. In one sense, they are 'the same'; the 'whole hierarchy' is just a special case of 'whole sub-trees'. But the deeper question is 'how do you want to demonstrate - visualize, represent - the changes in the hierarchy over time?' Are you seeking to compare the states at the start and end times, or are you seeking to see the intermediate changes too? How do you want to represent the moves of an individual within a hierarchy?

More questions than answers - but I hope the pointers are some help.

Jonathan Leffler
A: 
table item(id, ...)

table item_link(parent_item, child_item, from_date, until_date)

The links will store the representation of the tree for a certain time

This structure represents a network instead of a plain hierarchy but it supports moving things in a hierarchy but also look back in time. Some things need to be checked in application logic is to disallow joe being linked at different places in the hierarchy at the sametime.

Reporting is relatively easy with connect by prior clause (in oracle)

Other details can be related to item or even item link if it is to specify additional data on the relation.

Janco