views:

60

answers:

3

I need a way to represent existential relations in a database. For instance I have a bio-historical table (i.e. a family tree) that stores a parent id and a child id which are foreign keys to a people table. This table is used to describe arbitrary family relationships. Thus I’d like to be able to say that X and Y are siblings without having to know exactly who the parents of X and Y are. I just want to be able to say that there exists two different people A and B such that A and B are each parents of X and Y. Once I do know who A and/or B are I’d need to be able to reconcile them.

The simplest solution I can think of is to store existential people with negative integer user ids. Once I know who the people are, I’d need to cascade update all of the IDs. Are there any well-known techniques for this?

A: 

Hmmm, come to think of it, I guess I need a general way to reconcile duplicate people anyway and I can use it for this purpose. Thoughts?

Rodrick Chapman
You should add this as an edit to your main question, this section is just for answers.
notJim
+1  A: 

Does existential mean "non existant"?

They don't have to be negative. You could just add a record to People table with no last/first name and perhaps a flag "unknown person". Or existential if you like.

Then when you know something (e.g. like last name but not first) you update this record.

Reconciling duplicate people could be more difficult. I guess you could just update FamilyTree set parent_id=new_id where parent_id=old_id, etc. But this means for instance that the same person could end up with too many parents, so you'll need to perform a number of complex checks before doing that.

yu_sha
That does give me an idea. Maybe have an "existential person" table which consists only of ids. Then another person table that contains the data once we know it. This is a personal project so I don't mind being too academic.
Rodrick Chapman
Another table is a bad idea. You need a foreign key from your FamilyTree table to Person table. You can't have a foreign key to two tables at the same time.I suggest having a special flag in Person table. Field 'Unknown' of type Boolean. Then you can easily switch this flag off when the person becomes known.
yu_sha
+1  A: 

I would document only the known relationships in a link table which links your Person table to itself with:

FK Person1ID
FK Person2ID
   RelationshipTypeID (Sibling, Father, Mother, Step-Father, Step-Mother, etc.)

With some appropriate constraints on that table (or multiple tables, one for each relationship type if that makes the constraints more logical)

Then when other relationships can possibly (a half-sibling will only share one parent) be inferred (by running an exception query) but are missing, create them.

For instance, people who are siblings who don't have all their parents identified:

SELECT *
FROM People p1
INNER JOIN Relationship r_sibling
    ON r_sibling.Person1ID = p1.PersonID
    AND r_sibling.RelationshipType = SIBLING_TYPE_CONSTANT
INNER JOIN People p2
    ON r_sibling.Person2ID = p2.PersonID
WHERE EXISTS (
    -- p1 has a father
    SELECT *
    FROM Relationship r_father
        ON r_father.RelationshipType = FATHER_TYPE_CONSTANT
        AND r_father.Person2ID = p1.PersonID
)
AND NOT EXISTS (
    -- p2 (p1's sibling) doesn't have a father yet
    SELECT *
    FROM Relationship r_father
        ON r_father.RelationshipType = FATHER_TYPE_CONSTANT
        AND r_father.Person2ID = p2.PersonID
)

You might need to UNION the reverse of this query depending on how you want your relationships constrained (siblings are always commutative, unlike other relationships) and then handle mothers similarly.

Cade Roux