I am trying to design a data warehouse for a state government where data about individuals may arrive from multiple agencies. These individuals may or may not have SSN's or Tax IDs -- think homeless people who are entitled to Medicaid, or data about incarcerated people from the department of corrections. As the data arrives the system would create a surrogate key for the individual and associate it with whatever identifying inforamtion -- DOB, gender, name -- may be available. However, over time it may turn out that two surrogate keys created with data from two separate departments actually represent the same person; or a single surrogate key may actually be representing twins. In such cases the keys may have to merge or divide -- possibly retroactively.
There are obviously (bi)temporal aspects to be considered here since information recorded at transaction time t1 for valid time v1 may be different than information recorded at transaction time t2 for valid time v1.
My current design loads up the identifying data into a graph with the individuals identifed as nodes. Edges are created when reliable data links two nodes. The graph is built using the data available at transaction t1. Connected components are extracted and assigned surrogate keys. The idea is to create a similar graph at transaction time t2 and then map the evolution of surrogate keys from t1 to t2. This is where I need to bring in the merge/divide issues.
I am not looking for an exact solution from the SO sommunity but pointers to literature or published designs for such a scenario. BTW, the same scenario repeats if one handles financial data as companies merge, spin-off over time and differnet agencies record the transactions differnetly at different times.
Thanks.