views:

70

answers:

3

Hi,

Our database is designed based on EAV (Entity-Attribute-Value) model. Those who have worked with EAV models know all the crap that comes with for the purpose of flexibility.

I asked my client about the reasons why using EAV model (flexibility), and their response was: Their entities change over time. So, today they may have a table with a few attributes, but in a month time, a few new attributes may be added, or an existing attribute may be renamed. They need to produce reports to get back to any stage in time and query the data based on the shape of entities at that stage.

I understand this is not feasible with a conventional relational model, but I personally see EAV as anti-pattern. Are there any other alternative models that enables us to capture the time dimension in changes to the entities and instances?

Cheers, Mosh

A: 

Create a new table description for each Entity description Version and one additional table that tells you which table is which version. The query system should be updated as well.

I think creating a script that generates, tables and queries is your best shot.

fabrizioM
+2  A: 

Regardless of the kind of relational model you use, tracking field name changes requires a lot of meta data which you must keep track of in either transaction logs or audit tables. Unfortunately, querying either of those for state at a particular date is very complicated. If your client only requires state at a particular time date however, meaning the entire state, not just with respect to name changes, you can duplicate the database and roll back the transaction log to the particular time required and run your queries on the new instance. If entities added after the specified date need to show up in the query with the old field names however, you have a very large engineering problem ahead of you. In that case, with the information you provided in your question, I would suggest either negotiating alternatives with the client or getting more information about the use of the reports to find alternative solutions.

You could move to a document based datastore, but that still wouldn't solve the problem in the second case. Sorry this isn't really an answer, but having worked through similar situations, the client likely needs a more realistic reporting solution or a number of other investors willing to front the capital for the engineering.

When this problem came up for us, we kept the db schema constant and implemented an entity mapping factory based on a timestamp. In the end, the client continually changed requirements (on a weekly to monthly basis) as to how aggregate fields were calculated and were never fully satisfied.

NickLarsen
@Nick. Excellent response. I would add that some clients will be completely unsatisfied, because they don't accept the trade off between ultimate flexibility and long term consistency when it comes to the data model. You just have to learn how to manage such clients and prevent them from ruining your life or your reputation.
Walter Mitty
+1  A: 

There is a difference between EAV done faithfully or badly; 5NF done by skilled people or by those who are clueless.

Sixth Normal Form is the Irreducible Normal Form (no further Normalisation is possible). It eliminates many of the problems that are common, such as The Null Problem, and provides the ultimate method identifying missing values. It is the academically and technically robust NF. There are no products to support it, and it is not commonly used. To be implemented properly and consistently, it requires a catalogue for metadat to be implemeneted. of course, the SQL required to navigate it becomes even more cumbersome (SQL already being cumbersome re joins), but this is easily overcome by automating the production of SQL from the metadata.

EAV is a partial set or a subset of 6NF. The problem is, usually it is done for a purpose (to allow columns to be added without having to make DDL changes), and by people who are not aware of the 6NF, and who do not implement metadata. The point is, 6NF and EAV as principles and concepts offer substantial benefits, and performance increases; but commonly it is not implemented properly, and the benefits are not realised. Quite a few EAV implementations are disasters, not because EAV is bad, but because the implementation is poor.

Eg. Some people think that the SQL required to construct the 3NF rows from the 6NF/EAV database is complex: no, it is cumbersome but not complex. More important, an ordinary SQL VIEW can be provided, so that all users and report tools see only the straight 3NF VIEW, and the 6NF/EAV issues are transparent to them. Last, the SQL required can be automated, so the labour cost that many people endure is quite unnecessary.

So the answer really is, Sixth Normal Form, being the father of EAV, and a purer form, is the replacement for it. The Caveat is, ensure it is done properly. I have one large 6NF db, and it suffers none of the problems people post about, it performs beautifully, the cust is very happy (no further work is a sign of complete functional satisfaction).

I have already posted a very detailed answer to another question which applies to your question as well, whhich you may be interested in.

Other EAV Question

PerformanceDBA