views:

192

answers:

4

We have a number of items coming in from a web service; each item containing an unknown number of properties. We are storing them in a database with the following Schema.

Items
- ItemID
- ItemName

Properties
- PropertyID
- PropertyName
- PropertyValue
- PropertyValueType
- TransmitTime
- ItemID [fk]

The properties table is growing quite large since it stores the properties for each item, each time the web service is called. My question is this: at what point should we stop adding new records to the Properties table, and archive older Property records according to their transmit time? When does the properties table become too large, and take too long to query? Is there a rule of thumb?

Thanks.

+1  A: 

I don't think there's a golden rule for this. Your schema is pretty normalized though normalization can result in significant degrading in performance.

Several factors to consider:
- Usage scenario
- Server hardware specs
- Nature of DB Operation (E.g. more read than write?, insert and no update?)

For your case, if the number of properties do not exceed a certain number, a single jagged table might be better or maybe not. (I might get flamed for this statement :P)

Archiving strategy also depend on your business needs/requirement. You might need to pump up your hardware just to meet that need.

o.k.w
A: 

I'm not sure about MS SQL Server but most databases seem to have a way to partition tables. That is, make a virtual table from many smaller tables and divide the data between them based on some simple rules.

This is very good for time based data like this. Divide the table on a time period like a day or an hour. Then once per time period add a new table partition and drop the oldest table partition. Much more efficient than doing a DELETE WHERE time< now - '1 hour', or whatever.

Or instead of dropping the oldest, archive it or just let it stick around taking up space. As long as your queries always specify the date range, the queries can use only the most appropriate sub-tables.

Zan Lynx
+1  A: 

There is no rule of thumb

Some thoughts:

  • define "large" (we have 160 million rows tables)
  • do you have a problem now? if no, don't fix it
  • have you run profiler or some of the whizzy dmvs to find out bottlenecks (missing indexes etc)
  • if you need the data to be in hand, then you can't archive it
  • you could partition the table though
gbn
+1 don't fix what's not broken - good advice! :-)
marc_s
A: 

Depending of how many specific "property types" you have, the observation pattern may be able to help.

In your example:
Item = Subject,
Property = Observation,
PropertyName = ObservationType.Name,
PropertyValueType = ObservationType.IsTrait

This way you do not repeat PropertyName and PropertyValueType in each record. Depending on your application, if you can cache ObservationType and Subjectin app layer, then inserts will improve too.

- Measurement and trait are types of observations. Measurement is a numeric observation, like height. Trait is a descriptive observation, like color.

alt text

Damir Sudarevic