views:

33

answers:

1

Have a win 2003 box with MSSQL 2005 running on it. There is a database which is populated every morning with new/modified SalesOrder made the previous day. The database has several tables: SalesOrder, SalesOrderItem, SalesOrderItemBom. Each with a corresponding Version table (i.e. SalesOrderVersion, SalesOrderItemVersion, SalesOrderItemBomVersion) which has the exact same fields but with 2 addition columns VersionStartDate, VersionEndDate. The non-Versioned tables have the latest data.

Also the VersionStartDate is part of the PK for the version tables, so for example:- SalesOrder has OrderID as PK and SalesOrderItem has VersionStartDate, OrderID as the PK.

An simplified example of how the version table works:

SalesOrder

OrderID, Amount 1, 100 2, 200

SalesOrderVersion

VersionStartDate, OrderID, VersionEndDate, Amount 20090101 13:00:00, 1, 20090103 08:00:00, 50 20090103 08:00:00, 1, 99991231 00:00:00, 100 20090101 09:00:00, 2, 20090105 15:00:00, 300 20090105 15:00:00, 2, 99991231 00:00:00, 200

whenever the a row in SalesOrder is changed, the current row VersionEndDate in SalesOrderVersion is updated and a new row is inserted into SalesOrderVerion with VersionEndDate 99991231

Notes: If a record in SalesOrderItem was changed, it does not necessarily result in a change in "parent" record in SalesOrder

Have been requested to do a report showing trend and daily increments in sales. Off the top of my head, I was thinking of creating 3 snapshot tables for SalesOrder, SalesOrderItem, SalesOrderItemBom, which capture the "latest data" as of the current day and thus build incremental snapshots to show trends. Besides requiring more disk space, is there any draw back to this method compared to doing a stored procedure which joins the Version tables, cos it seems to be a long and expensive query.

Any thoughts or recommendations?

+1  A: 

There's a whole lot of "it depends" in here. Here are some discussion ideas.

Is data only ever loaded in the morning? That would mean the data doesn't change during the course of the day, so long expensive queries would not cause locking or blocking issues. Is this database only used for reporting? If not--if the data gets updated throughout the day--additional redundant copies of the data may be necessary to support your requirements.

How frequently will the new reports be run? Will they be run many, many times for a given day, or only once or twice? If the "time-aggregated" data's going to be accessed over and over again, making a redundant copy (the snapshot) might be worthwhile, but if the reports are going to be run once or twice and then dumped, I don't know that I'd bother.

How important is performance? Must the reports be generated and complete within two-three seconds after a phb hits a button? (pause) No, really, particularly once you explain to them the cost in dollars (extra hard drive space, extra backup space, extra time to prepare and backup and restore, whatever other stealth costs arise from data bloat). If they can wait a few minutes for once-a-day reports, then make it cheaper. (You still have the up-front cost of writing more complex code, but once it's done, it's done.)

On the flip side, adding a routine to generate the day's report after the day's data is loaded, and only keeping that one set of data (or maybe a set of data for the past week, four weeks, whatever) has a strong appeal. If you know they're going to run that 5 minute report, launch it in the AM so it's ready when they come in.

These are all based on guesses at your operational requirements. Like I said, much depends on what you have and what you need.

Philip Kelley