ansaurus

Question

How should I keep accurate records summarising multiple tables?

Answer 1

+2 A:

I see two approaches:

You move the data in a separate database, denormalized, putting some precalculation, to optimize it for quick access and reporting (sounds like a small datawarehouse). This implies you have to think of some jobs (scripts, separate application, etc.) that copies and transforms the data from the source to the destination. Depending on the way you want the copying to be done (full/incremental), the frequency of copying and the complexity of data model (both source and destination), it might take a while to implement and then to optimizie the process. It has the advantage that leaves your source database untouched.
You keep the current database, but you denormalize it. As you said, this might imply changing in the logic of the application (but you might find a way to minimize the impact on the logic using the database, you know the situation better than me :) ).

Cătălin Pitiș 2009-06-04 10:30:29

#1 sounds standard, and appropriate. staging area --> normalized area --> reporting area

John Pirie 2009-06-04 12:18:54

Yes, that's why I put it first :)

Cătălin Pitiș 2009-06-04 12:45:48

Answer 2

+1 A:

You can create triggers.

As soon as one of the calculated values changes, you can do one of the following:

Update the calculated field (Preferred)
Recalculate your summary table
Store a flag that a recalculation is necessary. The next time you need the calculated values check this flag first and do the recalculation if necessary

Example:

CREATE TRIGGER update_summary_table UPDATE OF order_value ON orders 
BEGIN
  UPDATE summary 
    SET total_order_value = total_order_value 
                          - old.order_value 
                          + new.order_value 
    // OR: Do a complete recalculation
    // OR: Store a flag
END;

More Information on SQLite triggers: http://www.sqlite.org/lang_createtrigger.html

DR 2009-06-04 10:39:52

The trouble is that performing the query again from scratch in each update uses too much CPU time (and might cause too much contention on the database locks).

Dickon Reed 2009-06-04 11:29:28

That is the reason why I added the first and the third option: Either update only those fields which are necessary or write a marker that an update is needed and postpone the recalculation to a later time.

DR 2009-06-04 12:16:11

Answer 3

+1 A:

Can the reports be refreshed incrementally, or is it a full recalculation to rework the report? If it has to be a full recalculation then you basically just want to cache the result set until the next refresh is required. You can create some tables to contain the report output (and metadata table to define what report output versions are available), but most of the time this is overkill and you are better off just saving the query results off to a file or other cache store.

If it is an incremental refresh then you need the PK ranges to work with anyhow, so you would want something like your high water mark data (except you may want to store min/max pairs).

2009-06-04 13:27:04

Answer 4

A:

In the end I arranged for a single program instance to make all database updates, and maintain the summaries in its heap, i.e. not in the database at all. This works very nicely in this case but would be inappropriate if I had multiple programs doing database updates.

Dickon Reed 2009-06-08 15:43:55

Answer 5

A:

You haven't said anything about your indexing strategy. I would look at that first - making sure that your indexes are covering.

Then I think the trigger option discussed is also a very good strategy.

Another possibility is the regular population of a data warehouse with a model suitable for high performance reporting (for instance, the Kimball model).

Cade Roux 2009-06-08 15:54:45

Oh, I checked my indexing first :) There's only any point in looking at this kind of thing if you've already made the queries run as fast as you think you can.

Dickon Reed 2009-06-10 09:54:43

ansaurus

tags:

views:

answers:

How should I keep accurate records summarising multiple tables?

related questions