best way to do reporting for a million rows

views:

answers:

+2 Q:

best way to do reporting for a million rows

I have two database tables, subscription and transaction, where one subscription can have many transactions. The status of the subscription depends mainly on the transactions that belong to it. So if I want to calculate next process date I would look at the period field of the subscription object and then analyze the subscription's transactions to determine its status. This all works fine.

The problem that I am facing is that the table contains over 400,000 subscription objects and millions of transaction records, so it's getting kinda tricky to build a report summary of the subscriptions (like how many of each from about ten possible statuses that are calculated dynamically)

Since all of the logic to calculate the status of each subscription is in the c# code, I have to load an entire graph of subscription objects with all of their child transaction objects using linq-to-sql. This is taking quite a long time, maybe two minutes or so. I'm looking at caching, but won't give real-time results. I'm just wondering if there is a strategy in place that could solve this, or maybe a index on my database that may speed the linq to sql query up. Or if I just designed it horribly from the beginning.

Thanks.

Create the appropriate indexes (not enough info in your question to know what these are). A million rows is not that large a set to run a joined query on, provided you have good indexes.

Can you create a View which contains the logic to calculate the status you require? This would presumably reduce I/O, by not having to return as much data to the client.

Mitch Wheat 2010-01-06 07:29:42

A million rows is a lot to load into the front end for processing.

ck 2010-01-07 09:14:26

@ck: you misundertand me. I not suggesting a million rows be brought down to client. I'm refering to the join part of the question. I will update my answer to make it clearer.

Mitch Wheat 2010-01-07 09:24:20

+4 A:

Since all of the logic to calculate the status of each subscription is in the c# code, I have to load an entire graph of subscription objects with all of their child transaction

Perhaps you shouldn't load all this data in the client and do all the calculations row by row. This is what databases are actually good at. Do the calculation on server side, better still have the calculation stored in the tables and just look it up in your reports. If you have 400k subscriptions and +M transactions then the corner stone of your design is the database, not the client. You need to invest your time and design in the data model, and the client comes after that.

Remus Rusanu 2010-01-06 07:46:07

Absolutely! Running this on the client will probably never be fast, and is guaranteed to be inefficient. If the computations are complex there is always the option of pl-sql stored procedures or managed code stored procedures. It might even be possible to run the C# code for the calculations directly in the database.

Rune Grimstad 2010-01-06 09:39:09

The standard answer for database problems is, it depends. You need to analyse your problem to determine where the bottleneck lies. Is the main cost reading millions of rows from disk? Or is it sending millions of rows across the network? Or is it stashing millions of rows in memory?

Because there are utterly different solutions to each problem. For instance, if it transpires that your problem is due to swapping data between real memory and virtual memory then building additional indexes is not going to help (unless there is an index which can be used to pre-filter the results to reduce the number of rows returned).

APC 2010-01-06 07:47:14

+1 A:

Do you have to redo all the calculations each time you access the subscription object? I some cases it's possible to store the last calculated result in the object (or in a new table) and start the calculations from there. You would probably need to save the id of the last transaction included with the calculation with the result. If this is feasible in your situation, you would be able to load only unprocessed transactions to memory.

RA 2010-01-06 09:30:20

ansaurus

tags:

views:

answers:

best way to do reporting for a million rows

related questions