views:

449

answers:

4

I've done a bit of reading on data modeling recently and have a question about roles that an entity may play.

Consider a simple case where you've got a Company, and a Company can be a Supplier, Customer, Distributor, etc. or a combination of these roles. So company X might be both a Supplier and a Customer.

Down at the data level you might have a table for CompanyS and then tables for SupplierS, CustomerS, etc that reference the Company table. At least I think this is how it might be represented.

Ok, so somewhere up in application-land you've got classes for CustomerS and SupplierS and so on. Each would be composed of a Company, and then whatever else is special about that particular class.

That's all ok and makes sense to me as long as we're only working with one entity class at a time. What if we want to start with a Company and see what roles it's playing? So in an application I might pull up a Company and see that it is a Supplier and a Distributor.

Now there are a few different ways I can think of to do this, but I feel that because this problem domain is so old that there must be some tried and true patterns for modeling these concepts.

Thus what I am in search of here are common strategies or patterns for modeling entity roles up at the application level. Specific reference material about this particular subject would be greatly appreciated (be it blogs or books or whatever).

A: 

I fear, I can not give the "common pattern" how to deal with this problem. But I also think, that there is not the "one and only" pattern at all.

The reason is, that modelling is somehow "fuzzy". I remember some rather similar modelling problem in a German computer magazine. It was a kind of contest and they showed the different solutions they got send in. The solutions where totally different, yet all of them where somehow valid. I think that it also depends on the details of the problem at hand. Sometimes a "lean" solution is beautiful ... in other cases, the "big, fat, grand" solution must be done to meet the projects needs ...

Such as, modelling is still a creative task with many free parameters.

Off course there are some "meta-patterns" which are agreed upon. For example in the book "Design patterns" by the famous "Gang of Four" and also many others available. But still many problems exist, where no agreed "best" solution exists.

In your case, it would be possible to use sub-classing (this is equivalent to specialization). It could also be possible to make "Supplier" etc. just an interface which might/might not be supported by a company (this could be seen as optional specialization from an abstract Entity). But it is also possible to use composition for the same problem. A Role could be an object (Entity) that is linked by the company (e.g. with a relation "has-role").

Juergen
I am here talking of course more about "object oriented modelling", not so much about "ER-Modelling". I hope it still gives some insights, since for me, they are not so far apart at all.
Juergen
+1  A: 

I would recommend using inheritance only as a last resort. Relationships like this one are not straightforward and are easy to foul up a design through a form of early optimization. When a Company can be both a Supplier and/or a Distributor, you don't want to create a Company with attributes of supplier or distributor. Instead, think of it like you would normalizing a database. You have a set of concepts as follows

  • Companies(CompanyID, name, attrib1, attrib2)
  • Suppliers which are Companies(SupplierID, CompanyID[foreign key], attrib1, attrib2)
  • Distributors(DistributorID, CompanyID, attrib1, attrib2) which are also Companies
  • VendorRelationship(RelationshipID, SupplierID, DistributorID, attrib1, attrib2) if you need to track details about the connection between a supplier and a distributor

This keeps the coupling between Company, Supplier, and Distributor low.

Another example of this is when a class has a state. Many times the conceptual model uses inheritance to show how the class is an instance of a class which has polymmorphic children for the sake of handling the different possible states. This causes problems when you have to change the state of the instance and you realize that your pointers will get invalidated and/or the affected instances may be cloned or otherwise inside collections which would be difficult or keep updated. because you have to create a new instance of another class and then replace the pointer to the target Company which may be difficult if there are many copies or if instances are contained within a container or list. The simpler and cleaner solution is for the class to contain an element which is of type BaseClass which has the possible states as child lasses. This way, when you want to change the state of an nobject, it can be handled through simple replacement of the status attribute with the updated concrete type.

Kelly French
+1  A: 

You might want to check out database design using Object Role Modeling. It fundamentally uses expressions of the type you use in your question statement, asserting the roles that objects (entities) play in relation to each other. Among other capabilities, it can generate a complete relational database design.

Here's another reference.

le dorfier
+1  A: 

Most DBMSs are not a good fit for this problem as they lack the flexibility that is needed. I guess that's why Charles Bachman came up with an extension of the CODASYL network data model back in 1977 by adding the role concept (see also The role data model revisited (PDF)). However, IMHO Bachman was still too much under the influence of the Hierachical data model, thinking in terms of owner/member relationship sets.

Conceptually the problem at hand corresponds to a graph/network. If you model entities as nodes, the edges (relationships) would carry labels to indicate the roles. For example, an Order entity would have a "ordered by" relationship connected to some other entity, which could be a Person, a Company or something else. When you follow an "ordered by" relationship you know that the target node represents an entity that implements an Orderer interface.

In math lingo what's needed here is a labeled, directed multigraph. You'll find that both in native graph databases like Neo4j (open source project I'm involved in) or in RDF. There are also RDF implementations on top of RDBMSs. Maybe the graph concept can also give you some hints about how to implement this from scratch. I also discuss the role concept briefly in my blog post Flexibility in data modeling.

nawroth