views:

23

answers:

2

Is there a pattern that can handle recurring dimension in data warehouse? I've got recurring company subjects structure. Sales fact can be assigned at every level. Example

 Company A                  <- sales facts here
    Company A subcompany    <- sales facts here
         Department A1      <- sales facts here
         Department A2      <- sales facts here
 Company B                  <- sales facts here
 Company C                  <- sales facts here
    Company C department    <- sales facts here

While displaying sales fact sum for Company A I want it to be sum of sales of whole tree.

In my relational database I have a parent-child recurring structure. I can't (or don't know how) create this kind of structure in data warehouse, as dimensions levels must be defined.

I thought about 3 levels of hierarchy, but some companies doesn't have departments at all.

I'm using InfiniDB and trying to configure Mondrian and JPalo

+1  A: 

Simply de-normalize this into the dimDepartment table

dimDepartment          Example Data
----------------       -------------
DepartmentKey            1234
DepartmentBusinessKey    a_b_a1
Department               A1
SubCompany               B
Company                  A

So for whole company A:

select
    sum(Amount) as TotalSale
  , sum(Taxes)  as TotalTax
from factSale      as f
join dimDepartment as d on d.DepartmentKey = f.DepartmentKey
where Company = 'A' 

for sub-company B of the company A

where Company    = 'A'
  and SubCompany = 'B'

for department A1, sub-company B, company A

where Company    = 'A'
  and SubCompany = 'B'
  and Department = 'A1'

If a company does not have sub-companies, simply use 'none' or 'main' as a default sub-company name.

Damir Sudarevic
And this is a solution that I came into too. It looks like it is the only one that makes sense.
peperg
A: 

Your question here really relates to the modelling of ragged hierarchies vs fixed hierarchies. It's a big subject and, while there are methods for storing and querying ragged hierarchies, in many cases you will find that one or other aspect of your architecture or business model constrains you back to have fixed/named-level hierarchies - hence unless the depth is truly arbitrary (it rarely is) you are better picking a sensible value and implementing based on it. In your data for example, it would suggest that the levels themselves are known/defined but may be optional - Company/Sub-Company/Department/Sub-Department etc - If you ever wanted to sum up the costs of the HR departments of all companies you would find it much easier if you always new that that data existed at a specific level (eg 3) of your tree...

M

MarkH