views:

485

answers:

2

An application that I'm facing at a customer, looks like this:

  • it allows end users to enter "materials".
  • To those materials, they can append any number of "properties".
  • Properties can have a any value of type: decimal, int, dateTime and varchar (length varying from 5 characters to large chunks of text),

Essentially, the Schema looks like this:

Materials
MaterialID int not null PK
MaterialName varchar(100) not null

Properties
PropertyID
PropertyName varchar(100)

MaterialsProperties
MaterialID
PropertyID
PropertyValue varchar(3000)

An essential feature of the application is the search functionality: end users can search materials by entering queries like:

  • [property] inspectionDate > [DateTimeValue]
  • [property] serialNr = 35465488

Guess how this performs over the MaterialsProperties-table with nearly 2 million records in it.

Database was initially created under SQL Server 2000 and later on migrated to SQL Server 2005

How can this be done better?

+1  A: 

You could consider separating your MaterialsProperties table by typel e.g. into IntMaterialProperties, CharMaterialProperties, etc. This would:

  • Partition your data.
  • Allow for potentially faster look-ups for integer (or other numeric) type look-ups.
  • Potentially reduce storage costs.

You could also introduce a Type column to Properties, which you could use to determine which MaterialProperties table to query. The column could also be used to validate the user's input is of the correct type, eliminating the need to query given "bad" input.

Adamski
Initially, I was thinking of putting additional value-columns, specific to each data type, in the materialsProperties-table. But your solution will do better performance wise and in terms of storage efficienty. Thanks a lot!
No problem. You can upvote or accept my answer if you like (blatent scrounge attempt for rep points).
Adamski
I wasn't aware this kind of db design had a name ("Entity Attribute Value Model techniques"). I have to give bkm credit for that. I'd mark your answer as "helpful", but apparently I'm short of reputatio points.Thanks again to you both for your quick and very helpful answers.
A: 
  1. Since users can enter their own property names, i guess every query is going to involve a scan of the properties table (in your example i need to find the propertyid of [inspectionDate]). If the properties table is large, your join would also take a long time. You could try and optimize by denormalizing and storing name with propertyID. This would be a denaormalized column in the MaterialsProperties table.
  2. You could try adding a property type (int, char etc) to the materialsproperty table and partition the table on the type.
  3. Look at Object Relational Mapping/Entity Attribute Value Model techniques for query optimization.
  4. Since you already have a lot of data (2 million records) do some data mining as see if there are repeating groups of properties for many materials. You can them put them in one schema and the rest as the EAV table. Look here for details: http://portal.acm.org/citation.cfm?id=509015&dl=GUIDE&coll=GUIDE&CFID=49465839&CFTOKEN=33971901
bkm
Actually, end users select properties from a dropdownlist. So I have the propertyID from the start.The data mining option seems very interesting, thanks for pointing me to that!