views:

35

answers:

2

So I have two dimensions in my data warehouse:

dim_machine
-------------
machine_key
machine_name
machine_type


dim_tool
------------
tool_key
tool_name
machine_type

What I want to make sure of is the machine_type field in both dimensions has the same data. Should I create a third dimension to snowflake between the two or is there another alternative?

+2  A: 

I'm not sure exactly what problem you're trying to solve? This sounds like something that you would simply build into the ETL process: for both dimensions, map your source data to the same target list of machine types. If a new value appears that has no mapping, raise an error (or set a default placeholder value and review the data later).

A completely different option would be a "mini-dimension" (Kimball's term), that holds all possible machine/tool combinations. If two dimensions are closely related and often used together in searches then it can be useful way to consolidate and simplify them. But even then, I assume you will be checking and cleaning the source data to conform the machine types.

Pondlife
I'm trying to solve the issue that I have a relationship between machine and tool. I know I could leave the dimensions alone and everything would be okay. Just didn't know how far to de-normalize the data (i.e. build a machine_type dimension and link the three dimensions together). While the mini-dimension could be done, we're looking at hundreds of thousands of combinations.
DBA_Alex
If there's a relationship between machines and tools it seems like combining them is worth considering. I wouldn't say that many rows is a problem: we have 350K rows in one dimension and no issues at all.
Pondlife
Thanks for that - certainly something to consider. The reason I haven't considered it before is because the list of valid combinations is not fully created yet.
DBA_Alex
+2  A: 

Keep in mind that a data warehouse is a de-normalized structure, so it is normal for data to repeat in dimensions. The integrity should be provided in the operational system and the ETL process. Suppose, we have something like the model below.

alt text

The business process that dispenses tools has to know which tool can be installed on which machine. Suppose a wrong tool is somehow installed on a machine. It is better to import data to match that fact and run a report that will discover a bug in the business process, than to break the ETL process because the tool and machine types do not match.

For example, a query (report) like this wold discover a mismatch and would prove quite useful.

select
      'tool-machine mismatch' as alarm
    , full_date
    , machine_name
    , machine_type
    , tool_name
    , matching_machine_type
    , employee_full_name
from fact_installed_tools as f
join dim_machine          as m on m.machine_key  = f.machine_key
join dim_tool             as t on t.tool_key     = f.installed_tool_key
join dim_date             as d on d.date_key     = f.date_key
join dim_employee         as e on e.employee_key = f.employee_key
where machine_type != matching_machine_type ;
Damir Sudarevic
Thanks for that Damir, helped quite a bit ^^b
DBA_Alex