We've got data with two different origins: some comes from a customer, some comes from different vendors. Currently, we physically "merge" this data into a massive table with almost a hundred columns, tens of thousands of rows and no formal separation of the two dimensions. Consequently, we can't actually use this table for much.
I'm going to redesign this mess into a proper, but small, star schema.
The two dimensions are obvious. One of them, for example, is time.
The customer-supplied data provides a number of fact values. Each vendor may (or may not) provide additional fact values that fit the same dimensions.
This fact data all has the same granularity. It can be called "sparse" because we don't often get information from all vendors.
Here's my dilemma.
Is this one fact table -- with some nulls -- populated from different sources?
Or is this n+1 fact tables -- one populated from the customer, the others populated from each vendor?
There are pros and cons to each design. I need some second opinions on the choice between "merge" or "load separately".
Customer supplies revenue, cost, counts, weights, and other things they know about their end of a transaction.
Vendor one supplies some additional details about some of the transactions -- weights, costs, durations. The other transactions will have no value from vendor one.
Vendor two supplies some additional details about some of the transactions -- volumes, durations, lengths, foreign currency rates. The other transactions will have no value for vendor two.
Some transactions will have both vendors. A few transactions will have neither vendor.
One table with nulls? Three tables?