views:

238

answers:

1

I need to store a large table (several millions or rows) that contains a large number of user-defined fields (not known at compile time, but probably around 20 to 40 custom fields). It is very important (performance-wise) for me to be able to query the data based on those custom fields: i.e. "Select the rows where this attribute has that value, that attribute is that value, etc.". Each query has some 20 to 30 WHERE clauses.

My ideas so far:

  1. Change the database schema everytime a new user field is implemented. Keep each user defined field as a column in the table. Add and maintain indexes on each custom-created column. How to properly build those indexes is a big problem, as I don't know what attributes(columns) will be used in the WHERE queries.

  2. Store the custom fields as an XML type column. As I understand from SQL2005 I can query inside the XML in the XML type columns. Not so sure about performance though.

  3. Entity Attribute Value . This is what I am using now, but it's a pain.

Any suggestions?

Edit: Some clarifications on my requirements. I have a table, 40 -50 million rows of (say) ID numbers and various attributes associated with those IDs.

Let's say 20 million of them have "CustomAttribute1" equal to 2, then 5 million have "CustomAttribute2" equal to "Yes" and 3 million have "CustomAttribute20" equal to 'No'

I need a FAST method of returning all IDs where:
     1. CustomAttribute1  = 2
     2. CustomAttribute2  = 'Yes'
     3. CustomAttribute4  = null
     4. CustomAttribute20  != 'No'  
  etc...

We have this implemented as EAV: the select query is a nightmare to implement and maintain, it takes a long time to return result, and most anoyingly the DB scales to huge sizes even for small ammounts of data, which is weird since the EAV is essentially normalizing the data but I assume all the indexes take up a bunch of space.

+3  A: 

It seems like you've listed your available options. EAV can be a pain for querying (and slow, depending on how many criteria you want to search on simultaneously), but it tends to be the most "sane" and RDBMS-agnostic implementation.

Modifying the schema is a no-no...obviously it can be done, but such a practice is abhorrent. I do not approve.

The XML option is a solution, and SQL Server can query inside the structure. I'm not certain about other RDBMS's, and you don't list which one you're using in the post or the tags.

If you're going to be querying on many attributes (say, 20+) simultaneously, then I would probably recommend the XML solution just to limit the number of joins you'll have to make. Aside from that, I would stick with EAV.

Adam Robinson
How is the speed to querying XML compared to querying normal columns. Do they even index the XML ?
Radu094
You can create indexes on the XML. I'll leave specific benchmarking to those with more experience with database XML than I, but there is a good amount of information available on the internet about XML querying and indexing in SQL Server.
Adam Robinson