views:

114

answers:

8

Let's say you're modeling an entity that has many attributes (2400+), far greater than the physical limit on a given database engine (e.g. ~1000 SQL Server). Knowing nothing about the relative importance of these data points (which ones are hot/used most often) besides the domain/candidate keys, how would you implement it?

A) EAV. (boo... Native relational tools thrown out the window.)

B) Go straight across. The first table has a primary key and 1000 columns, right up to the limit. The next table is 1000, foreign keyed to the first. The last table is the remaining 400, also foreign keyed.

C) Stripe evenly across ceil( n / limit ) tables. Each table has an even number of columns, foreign keying to the first table. 800, 800, 800.

D) Something else...

And why?

Edit: This is more of a philosophical/generic question, not tied to any specific limits or engines.

+3  A: 

Without having much knowlegde in this area, i think an entity with so many attributes really really needs a re-design. With that I mean splitting the big thing into smaller parts that are logically connected.

InsertNickHere
A: 

I would rotate the columns and make them rows. Rather than having a column containing the name of the attribute as a string (nvarchar) you could have it as a fkey back to a lookup table which contains a list of all the possible attributes.

Rotating it in this way means you:

  • don't have masses of tables to record the details of just one item
  • don't have massively wide tables
  • you can store only the info you need due to the rotation (if you don't want to store a particular attribute, then just don't have that row)
slugster
This is still an EAV variant, though
Joel Coehoorn
A: 

I'd use a one to many attribute table with a foreign key to the entity.

Eg

entities: id,

attrs: id, entity_id, attr_name, value

ADDED

Or as Butler Lampson would say, "all problems in Computer Science can be solved by another level of indirection"

Larry K
This is also EAV.
Mark Canlas
A: 
  1. I'd look at the data model a lot more carefully. Is it 3rd normal form? Are there groups of attributes that should be logically grouped together into their own tables?

  2. Assuming it is normalized and the entity truly has 2400+ attributes, I wouldn't be so quick to boo an EAV model. IMHO, it's the best, most flexible solution for the situation you've described. It gives you built in support for sparse data and gives you good searching speed as the values for any given attribute can be found in a single index.

Joe Stefanelli
+2  A: 

The key item to me is this piece:

Knowing nothing about the relative importance of these data points (which ones are hot/used most often)

If you have an idea of which fields are more important, I would put those more important fields in the "native" table and let an EAV structure handle the rest.

The thing is, without this information you're really shooting blind anyway. Whether you have 2400 fields or just 24, you ought to have some kind of idea about the meaning (and therefore relative importance, or at least logical groupings) your data points.

Joel Coehoorn
+5  A: 

Use Sparse Columns for up to 30000 columns. The great advantage over EAV or XML is that you can use Filtered Indexes in conjunction with sparse columns, for very efficient searches over common attributes.

Remus Rusanu
A: 

I would like to use vertical ( increase number of rows ) approach instead of horizontal ( increase number of columns).

You can try this approach like

Table -- id , property_name -- property_value.

The advantage with approach is, no need to alter / create a table when you introduce the new property / column.

Naveen
This would also be EAV.
Mark Bannister
This is also the exact same answer i proposed.
slugster
+2  A: 

My solution: investigate further. Specifically, establish whether the table is truly normalised (at 2400 columns this seems highly unlikely).

If not, restructure until it is fully normalised (at which point there are likely to be fewer than 1000 columns per table).

If it is already fully normalised, establish (as far as possible) approximate frequencies of population for each attribute. Place the most commonly occurring attributes on the "home" table for the entity, use 2 or 3 additional tables for the less frequently populated attributes. (Try to make frequency of occurrence the criteria for determining which fields should go on which tables.)

Only consider EAV for extremely sparsely populated attributes (preferably, not at all).

Mark Bannister