views:

39

answers:

1

I've tried to simplify for the purposes of asking this question. Hopefully, this will be comprehensible.

Basically, I have a fact table with a time dimension, another dimension, and a hierarchical dimension. For the purposes of the question, let's assume the hierarchical dimension is zip code and state. The other dimension is just descriptive. Let's call it 'customer' Let's assume there are 50 customers.

I need to find the set of states for which there is at least one zip code in which EVERY customer has at least one fact row for each day in the time dimension. If a zip code has only 49 customers, I don't care about it. If even one of the 50 customers doesn't have a value for even 1 day in a zip code, I don't care about it. Finally, I also need to know which zip codes qualified the state for selection. Note, there is no requirement that every zip code have a full data set - only that at least one zip code does.

I don't mind making multiple queries and doing some processing on the client side. This is a dataset that only needs to be generated once per day and can be cached. I don't even see a particularly clean way to do it with multiple queries short of simply brute-force iteration, and there are a heck of a lot of 'zip codes' in the data set (not actually zip codes, but the there are approximately 100,000 entries in the lower level of the hierarchy and several hundred in the top level, so zipcode->state is a reasonable analogy)

+1  A: 

As a first iteration you could try the following:

Assuming

  • customers (list of all customers)
  • zip_codes (list of all zip codes)
  • days (list of all days)
  • facts (fact table)

I believe you have to break it down

1) get customers and zip codes that have facts for every day

SELECT zip_id, customer_id, COUNT(DISTINCT days)
FROM facts
GROUP BY zip_id, customer_id
HAVING COUNT(DISTINCT days) = (SELECT COUNT(*) FROM days) 

2) from this result set check for zips that have all customers

SELECT zip_id, COUNT(DISTINCT customer_id)
FROM ( 
    SELECT zip_id, customer_id, COUNT(DISTINCT days)
    FROM facts
    GROUP BY zip_id, customer_id
    HAVING COUNT(DISTINCT days) = (SELECT COUNT(*) FROM days) 
    ) S
GROUP BY zip_id
HAVING COUNT(DISTINCT customer_id) = (SELECT COUNT(*) FROM customers)

after you get zip codes it should be easy to join it back to states.

Unreason
thanks. I have no idea why I was having so much difficulty wrapping my head around it.