views:

75

answers:

3

I rarely come across a situation where a single table for multiple entity types seems better than one table per entity type. Here's an example that makes sense to me, but academically it seems wrong.

QUESTION: Can I do this and still have a "sound" architecture?

Example Follows

Suppose two entity types, a corporation and a person. A corporation is typically owned by a person, but sometimes another corporation owns a corporation.

Holding onto that thought, and adding to it, let's say that every corporation has a registered agent attached to it who is responsible for the legal creation of the corporation. Furthering my illustration, the registered agent can be either a person or a corporation.

If you consider that the owner [parent] to the corporation [child] can be either a person or a corporation, you may begin to see the challenge in keeping the third normal form and avoiding redundancy.

In contrast to my example, if only people could own Corporations, the Ownership link table is very conventional, having columns: OwnershipID (sort of unecessary), CorporationID, PersonID.

Instead, you need something like: OwnershipID, CorporationID, OwnerID, OwnerEntityType (corp or person). Don’t get me wrong, you can make this work, but it won't be fun, to say the least.

Continuing on with the example I gave, you need to assign an agent to every Corporation. Usually, the agent is one of the owners (a person). In which case, you really do want to link back to the one record of that person. You don't want to have record of the person as an owner and then again as an agent (in an Agent table). That would be redundant.

Similarly to that "problem" a registered agent can also be a corporation, such as a law firm, a CPA, or a biz filings company, to name some typical examples. Just like the agent-person, an agent-corporation really should not get its own record as an agent-entity. Instead it needs to link back to the of its corporate existence in the Corporation table. [except that I'm ultimately saying to not have a CorporationEntity table]

Just like the link table that matched each corporation to its owner(s) of any type, person or corporation, you could have an agent link table of: AgentRepresentationID, CorporationID, AgentID, AgentType... but again, it would be ugly (IMO) when you have to pull together the related agents -- some from the Person table, some from the Corporation table.

This is why I say, in cases like this, you can see how a neutral entity type can be advantageous. It would be something like this:

Table: EntityAll
Key Columns: EntityID, EntityType (or EntityTypeID if you insist, link out to get the description), EntityName (there are concerns with names and different types... off topic to this post)

Link Table: CorporationOwnership
Key Columns: OwnershipID (again, my comment that this is kind of unecessary), ChildEntityID (the entity being owned; named "Child" for clarity, I wouldn't name it that) ParentEntityID (the parent entity)

Link Table: AgentRepresentation
Key Columns: AgentRepresentationID (...I won't say it), CorporationEntityID (the corp entity being represented), AgentEntityID (from the Entity table, equating to the record that's the agent here)

While you might be OK with my architecture, you should be a little bothered by the column naming in the link tables. It bothers me. Typically the second and third column names in those tables match exactly the name of the columns you JOIN in each entity's respective table (haha, but each entity doesn't have a respective table so you can't have the link table column names match the source column names because they are THE same column). Technically this does not matter, but it will break your naming conventions which does matter, but not enough to not do it.

In case I haven't driven it home well enough yet, here is how you'll pull it together. You JOIN the EntityAll table on its own self to get what you need.

List all Corps and their owners (in T-SQL):

SELECT Corp.EntityName as CorpName, Owner.EntityName as OwnerName
FROM EntityAll as Corp
JOIN CorporationOwnership as Link on (Corp.EntityID = Link.ChildEntityID)
JOIN EntityAll as Owner on (Link.ParentEntityID = Owner.EntityID)

Consequently, you'd do the same thing to return the agent, instead of the owner(s).

I realize this is not how we're trained to architect tables, but I feel pretty strongly that my solution eliminates redundant data AND makes it easier to code, manage and read.

P.S. I recently provided this example as an answer to an old question on SO. Being an old question, there was no dialogue. I need to implement this very example, and I’m curious about the ramifications of this architecture.

Here is the previous question/answer: http://stackoverflow.com/questions/2356934/

+1  A: 

Keeping in mind that Corporations have the rights of people, and that both infer a parental relationship of some format... If the business rule is that there can be only be one parent, you could use a self-referential foreign key to indicate the parent. The following is easier to explain if a Corporation and a person are considered the same entity:

ENTITY table

  • entity_id (primary key)
  • entity_type_code (foreign key to ENTITY_TYPE_CODE.entity_type_code, containing "person" and "corporation", etc)
  • parent_entity_id (foreign key relationship with ENTITY.entity_id, nullable)

The parent_entity_id also has to be nullable, because NULL indicates the root entity for a hierarchy. But this also means using a database that has hierarchical query support (IE: not MySQL).

OWNERSHIP table

The OWNERSHIP table would be the approach to take if the business rules need to support more than one parent to an entity, or vice versa. There are numerous names, but it's a table solely for supporting many-to-many relationships. You are correct, it is the ideal approach if the business rules require it.

Conclusion

The key to data modeling is constructing a model based on the business rules. The model shouldn't change often, so try to futureproof what you can--business analysts are key to getting this done.

OMG Ponies
+1, Good answer, appropriate to state that corporations have the rights of people (making them, in effect, of the same entity). Actually, the corp can have multiple parents (business partners). In many states' data, the parents of record are actually just "officers" and not necessarily owners -- not important to the concept being discussed, but given that we're elaborating, I see the need to more accurately describe the business rules. Confession: I designed it once already with multiple tables and now feel some pain... "Futureproof" in this case is to make it right before we go too far. :)
Chris Adragna
+1  A: 

Look up "generalization specialization relational modeling".

I think that there is a type of entity that I'll call "legal person". What you have called "person", and some might call "natural person" is a specialized kind of legal person. What you have called "corporation", and some might called "incorporated person", is a different kind of specialized legal person.

Seen this way, the relationship between "legal persons", "persons", and "corporations" can be seen as a gen-spec (generalization-specialization) pattern. The gen-spec gets a lot of treatment in the tutorials on object modeling, and fits pretty naturally with the concept of inheritance. gen-spec is often glossed over in tutorials on relational modeling. But the concept is well understood.

A legal person can own a corporation, regardless of which specialized type of legal person.

Your ENTITYALL table conforms to some of the features of a gen-spec relational design, but you could develop the model further. In particular, if we have an entity type "automobile", there's no particular reason why this wouldn't get an entry in an ENTITYALL table. But the fact that an automobile cannot own a corporation has now been obscured. I would want some kind of table that generalizes "person" and "corporation" into "legal person", but isn't so general that "automobile" would be classified as a "legal person". ENTITYALL is too generic for my preferences.

Looking at the best examples of gen-spec out there, we see that the primary key for the gen table and the primary key for each spec table are all drawn from the same domain. Further, the primary key of the spec tables operates as a foreign key reference to the gen table, in addition to maintaining entity integrity for the specialized entity. The joins turn out to be very nice. Your schema could profit from this design tid bit.

Walter Mitty
I appreciate your explanation about gen-spec and I had not previously recognized it gets more treatment in object modeling than in relational modeling; great assessment.
Chris Adragna
I agree with your critique about "EntityAll" and I like, instead, "BusinessParty" in its place (see comment and attribution above in my comment on the question).
Chris Adragna
+2  A: 

I think it was Hugh Darwen who coined the terms 'distributed key' and 'distributed foreign keys', where a single referenced key value exists in exactly one of multiple referencing relvars (tables); this would require a related concept of 'multple assignment' in order to atomically insert to both the referenced and referencing relvars.

While this could in theory be achieved in SQL-92 using deferrable schema-level ASSERTIONs (or perhaps CHECK constraints that support subqueries), it's rather a clunky process, is procedural (rather than set-based) and there isn't a SQL product that has ever support this functionality (or ever will, I susepct).

The best we can do with available SQL products is to use a compound key (entity_ID, entity_type) with a CHECK constraint on the entity_type in referencing tables to ensure there is no more than one referencing key value (note this is not the same as 'exactly one referencing key value') e.g.

CREATE TABLE LegalPersons
(
 person_ID INTEGER IDENTITY NOT NULL UNIQUE, 
 person_type VARCHAR(14) NOT NULL
    CHECK (person_type IN ('Company', 'Natural Person')), 
 UNIQUE (person_type, person_ID)
);

CREATE TABLE Companies
(
 person_ID INTEGER NOT NULL UNIQUE, 
 person_type VARCHAR(14) NOT NULL
    CHECK (person_type = 'Company'), 
 FOREIGN KEY (person_type, person_ID)
    REFERENCES LegalPersons (person_type, person_ID), 
 companies_house_registered_number VARCHAR(8) NOT NULL UNIQUE
 -- other company columns and constraints here
);

CREATE TABLE NaturalPersons
(
 person_ID INTEGER NOT NULL UNIQUE, 
 person_type VARCHAR(14) NOT NULL
    CHECK (person_type = 'Natural Person'), 
 FOREIGN KEY (person_type, person_ID)
    REFERENCES LegalPersons (person_type, person_ID) 
 -- natural person columns and constraints here
);

This superclass-subclass pattern is very common in SQL.

Ideally, a table name should reflect the nature of the set as a whole. You many need to think beyond a compound of other sets' names; perhaps ask a expert in the particular field of business e.g. an accountant may use the term 'payroll' rather than 'EmployeesSalaries'.

Another ideal is for a column's name to remain the same throughout the schema but with a subclassing approach you often need to qualify them (and this bothers me!) e.g.

CREATE TABLE CompanyAgents
(
 company_person_ID INTEGER NOT NULL UNIQUE, 
 company_person_type VARCHAR(14) NOT NULL
    CHECK (company_person_type = 'Company'), 
 FOREIGN KEY (company_person_type, company_person_ID)
    REFERENCES LegalPersons (person_type, person_ID), 
 agent_person_ID INTEGER NOT NULL, 
 agent_person_type VARCHAR(14) NOT NULL, 
 FOREIGN KEY (agent_person_type, agent_person_ID)
    REFERENCES LegalPersons (person_type, person_ID), 
 CHECK (company_person_ID <> agent_person_ID)
);

Note I would have used a single column key for agent_person_ID e.g.

 agent_person_ID INTEGER NOT NULL
    REFERENCES LegalPersons (person_ID)

because there is no restriction on entity type. In principle I feel better about retaining the two-column compound key for all references throughout the schema and I find in practice as often as not I need to now the entity type anyhow so this SQL DDL is saving a JOIN in SQL DML :)

onedaywhen
+1, "Distributed Foreign Keys" is key (pun intended). Additionally, note that I now believe the base entity should be called "BusinessParty" and they can have a type of either Person or Organization (avoiding "Corporation" which is more a legal specification than a logical one).
Chris Adragna