ansaurus

Question

SQL Architecture: Is this a justified case to have only one table storing multiple entity types? (using a self JOIN)

Answer 1

+1 A:

Keeping in mind that Corporations have the rights of people, and that both infer a parental relationship of some format... If the business rule is that there can be only be one parent, you could use a self-referential foreign key to indicate the parent. The following is easier to explain if a Corporation and a person are considered the same entity:

ENTITY table

entity_id (primary key)
entity_type_code (foreign key to ENTITY_TYPE_CODE.entity_type_code, containing "person" and "corporation", etc)
parent_entity_id (foreign key relationship with ENTITY.entity_id, nullable)

The parent_entity_id also has to be nullable, because NULL indicates the root entity for a hierarchy. But this also means using a database that has hierarchical query support (IE: not MySQL).

OWNERSHIP table

The OWNERSHIP table would be the approach to take if the business rules need to support more than one parent to an entity, or vice versa. There are numerous names, but it's a table solely for supporting many-to-many relationships. You are correct, it is the ideal approach if the business rules require it.

Conclusion

The key to data modeling is constructing a model based on the business rules. The model shouldn't change often, so try to futureproof what you can--business analysts are key to getting this done.

OMG Ponies 2010-09-30 04:15:10

+1, Good answer, appropriate to state that corporations have the rights of people (making them, in effect, of the same entity). Actually, the corp can have multiple parents (business partners). In many states' data, the parents of record are actually just "officers" and not necessarily owners -- not important to the concept being discussed, but given that we're elaborating, I see the need to more accurately describe the business rules. Confession: I designed it once already with multiple tables and now feel some pain... "Futureproof" in this case is to make it right before we go too far. :)

Chris Adragna 2010-09-30 04:35:31

Answer 2

+1 A:

Look up "generalization specialization relational modeling".

I think that there is a type of entity that I'll call "legal person". What you have called "person", and some might call "natural person" is a specialized kind of legal person. What you have called "corporation", and some might called "incorporated person", is a different kind of specialized legal person.

Seen this way, the relationship between "legal persons", "persons", and "corporations" can be seen as a gen-spec (generalization-specialization) pattern. The gen-spec gets a lot of treatment in the tutorials on object modeling, and fits pretty naturally with the concept of inheritance. gen-spec is often glossed over in tutorials on relational modeling. But the concept is well understood.

A legal person can own a corporation, regardless of which specialized type of legal person.

Your ENTITYALL table conforms to some of the features of a gen-spec relational design, but you could develop the model further. In particular, if we have an entity type "automobile", there's no particular reason why this wouldn't get an entry in an ENTITYALL table. But the fact that an automobile cannot own a corporation has now been obscured. I would want some kind of table that generalizes "person" and "corporation" into "legal person", but isn't so general that "automobile" would be classified as a "legal person". ENTITYALL is too generic for my preferences.

Looking at the best examples of gen-spec out there, we see that the primary key for the gen table and the primary key for each spec table are all drawn from the same domain. Further, the primary key of the spec tables operates as a foreign key reference to the gen table, in addition to maintaining entity integrity for the specialized entity. The joins turn out to be very nice. Your schema could profit from this design tid bit.

Walter Mitty 2010-09-30 10:15:14

I appreciate your explanation about gen-spec and I had not previously recognized it gets more treatment in object modeling than in relational modeling; great assessment.

Chris Adragna 2010-10-01 13:39:42

I agree with your critique about "EntityAll" and I like, instead, "BusinessParty" in its place (see comment and attribution above in my comment on the question).

Chris Adragna 2010-10-01 14:59:44

Answer 3

+2 A:

I think it was Hugh Darwen who coined the terms 'distributed key' and 'distributed foreign keys', where a single referenced key value exists in exactly one of multiple referencing relvars (tables); this would require a related concept of 'multple assignment' in order to atomically insert to both the referenced and referencing relvars.

While this could in theory be achieved in SQL-92 using deferrable schema-level ASSERTIONs (or perhaps CHECK constraints that support subqueries), it's rather a clunky process, is procedural (rather than set-based) and there isn't a SQL product that has ever support this functionality (or ever will, I susepct).

The best we can do with available SQL products is to use a compound key (entity_ID, entity_type) with a CHECK constraint on the entity_type in referencing tables to ensure there is no more than one referencing key value (note this is not the same as 'exactly one referencing key value') e.g.

CREATE TABLE LegalPersons
(
 person_ID INTEGER IDENTITY NOT NULL UNIQUE, 
 person_type VARCHAR(14) NOT NULL
    CHECK (person_type IN ('Company', 'Natural Person')), 
 UNIQUE (person_type, person_ID)
);

CREATE TABLE Companies
(
 person_ID INTEGER NOT NULL UNIQUE, 
 person_type VARCHAR(14) NOT NULL
    CHECK (person_type = 'Company'), 
 FOREIGN KEY (person_type, person_ID)
    REFERENCES LegalPersons (person_type, person_ID), 
 companies_house_registered_number VARCHAR(8) NOT NULL UNIQUE
 -- other company columns and constraints here
);

CREATE TABLE NaturalPersons
(
 person_ID INTEGER NOT NULL UNIQUE, 
 person_type VARCHAR(14) NOT NULL
    CHECK (person_type = 'Natural Person'), 
 FOREIGN KEY (person_type, person_ID)
    REFERENCES LegalPersons (person_type, person_ID) 
 -- natural person columns and constraints here
);

This superclass-subclass pattern is very common in SQL.

Ideally, a table name should reflect the nature of the set as a whole. You many need to think beyond a compound of other sets' names; perhaps ask a expert in the particular field of business e.g. an accountant may use the term 'payroll' rather than 'EmployeesSalaries'.

Another ideal is for a column's name to remain the same throughout the schema but with a subclassing approach you often need to qualify them (and this bothers me!) e.g.

CREATE TABLE CompanyAgents
(
 company_person_ID INTEGER NOT NULL UNIQUE, 
 company_person_type VARCHAR(14) NOT NULL
    CHECK (company_person_type = 'Company'), 
 FOREIGN KEY (company_person_type, company_person_ID)
    REFERENCES LegalPersons (person_type, person_ID), 
 agent_person_ID INTEGER NOT NULL, 
 agent_person_type VARCHAR(14) NOT NULL, 
 FOREIGN KEY (agent_person_type, agent_person_ID)
    REFERENCES LegalPersons (person_type, person_ID), 
 CHECK (company_person_ID <> agent_person_ID)
);

Note I would have used a single column key for agent_person_ID e.g.

 agent_person_ID INTEGER NOT NULL
    REFERENCES LegalPersons (person_ID)

because there is no restriction on entity type. In principle I feel better about retaining the two-column compound key for all references throughout the schema and I find in practice as often as not I need to now the entity type anyhow so this SQL DDL is saving a JOIN in SQL DML :)

onedaywhen 2010-09-30 13:21:59

+1, "Distributed Foreign Keys" is key (pun intended). Additionally, note that I now believe the base entity should be called "BusinessParty" and they can have a type of either Person or Organization (avoiding "Corporation" which is more a legal specification than a logical one).

Chris Adragna 2010-10-01 15:18:00

ansaurus

tags:

views:

answers:

SQL Architecture: Is this a justified case to have only one table storing multiple entity types? (using a self JOIN)

ENTITY table

OWNERSHIP table

Conclusion

related questions