views:

1443

answers:

2

Has anyone implemented a very large EAV or open schema style database in SQL Server? I'm wondering if there are performance issues with this and how you were able to overcome those obstacles.

+1  A: 

I'm not an expert on EAV, but several more experienced developers than I have commented that Magento's open-source e-commerce framework is slow primarily because of the EAV architecture through MySQL. The most obvious disadvantage can't easily be overcome. That being the difficulty with which it is to troubleshoot where and how information is represented for entities and attribute values as the size of the application increases. The second argument against EAV I have heard is that it requires table joins that get into low double digits, but it was commented that using InnoDB over MyISAM improved the performance some (or it could be vice-versa, but I can't remember totally).

hal10001
+4  A: 

Regardless of MS SQL Server versus any other brand of database, the worst performance issue with EAV is that people try to do monster queries to reconstruct an entity on a single row. This requires a separate join per attribute.

SELECT e.id, a1.attr_value as "cost", a2.attr_value as "color",
  a3.attr_value as "size", . . .
FROM entity e
  LEFT OUTER JOIN attrib a1 ON (e.entity_id = a1.entity_id AND a1.attr_name = 'cost')
  LEFT OUTER JOIN attrib a2 ON (e.entity_id = a2.entity_id AND a2.attr_name = 'color')
  LEFT OUTER JOIN attrib a2 ON (e.entity_id = a3.entity_id AND a3.attr_name = 'size')
  . . . additional joins for each attribute . . .

No matter what database brand you use, more joins in a query means geometrically increasing performance cost. Inevitably, you need enough attributes to exceed the architectural capacity of any SQL engine.

The solution is to fetch the attributes in rows instead of columns, and write a class in application code to loop over these rows, assigning the values into object properties one by one.

SELECT e.id, a.attr_name, a.attr_value
FROM entity e JOIN attrib a USING (entity_id)
ORDER BY e.id;

This SQL query is so much simpler and more efficient, that it makes up for the extra application code.

What I would look for in an EAV framework is some boilerplate code that retrieves a multi-row result set like this, and maps the attributes into object properties, and then returns the collection of populated objects.

Bill Karwin