views:

49

answers:

2

I'm developing an ETL process, and need a bridge table for a one-to-many relationship between a fact table and a dimension table (MySQL database). There is a limited number of combinations (some thousands), so I want to re-use group keys from the bridge table to to limit the size.

Any group of dimensions belonging to a fact row will consist of a number of dimension keys (1 to around 15), assigned to a unique group key, as below:

group_key | dimension_key
-----------------------
1         | 1
1         | 3
1         | 4
2         | 1
2         | 2
2         | 3
3         | 1
3         | 4

How do I go about retrieving the unique group key for the dimensions 1,3,4 (ie. 1)?

A: 

If I understand you correctly, what you want is a bridge table that looks like this:

group_key | dimension_set
-----------------------
1         | (1, 3, 4)
2         | (1, 2, 3)
3         | (1, 4)

You have 2 options that I can see.

You can either pull the entire bridge table into a program, and programatically determine the group key from the dimension set.

Or you can encode the dimension key using a mathematical formula to come up with an integer than you can index.

Something like a + (b * 32) + (c * 32 * 32) + ... Use the lowest power of 2 that encompasses the number of unique dimensions.

Gilbert Le Blanc
+1  A: 

I think you're asking for a query that returns the groups such that all dimensions in a specific list are associated with the group. That is, rows must exist mapping that group to each of the dimensions, and you want to know which groups satisfy this.

SELECT f1.group_key
FROM facts f1
JOIN facts f2 ON (f1.group_key = f2.group_key)
JOIN facts f2 ON (f1.group_key = f2.group_key)
WHERE f1.dimension_key = 1
  AND f2.dimension_key = 3
  AND f3.dimension_key = 4;

The other solution is to count the matching rows in the group:

SELECT f.group_key
FROM facts f
WHERE f.dimension_key IN (1,3,4)
GROUP BY f.group_key
HAVING COUNT(*) = 3;

But I find that usually GROUP BY is a performance killer particularly in MySQL.

Bill Karwin
Your first proposed solution is exactly what I'm looking for! Thank you :)
Mads Jensen