How to gain performance when maintaining historical and current data?

views:

226

answers:

How to gain performance when maintaining historical and current data?

I want to maintain last ten years of stock market data in a single table. Certain analysis need only data of the last one month data. When I do this short term analysis it takes a long time to complete the operation.

To overcome this I created another table to hold current year data alone. When I perform the analysis from this table it 20 times faster than the previous one.

Now my question is:

Is this the right way to have a separate table for this kind of problem. (Or we use separate database instead of table)
If I have separate table Is there any way to update the secondary table automatically.
Or we can use anything like dematerialized view or something like that to gain performance.

Note: I'm using Postgresql database.

I'm not sure about PostgreSQL, but I can confirm that you are on the right track. When dealing with large data volumes partitioning data into multiple tables and then using some kind of query generator to build your queries is absolutely the right way to go. This approach is well established in Data Warehousing, and specifically in your case stock market data.

However, I'm curious why do you need to update your historical data? If you're dealing with stock splits, it's common to implement that using a seperate multiplier table that is used in conjunction with the raw historical data to give an accurate price/share.

MrTelly 2009-04-20 05:19:25

Thanks for your guidance.

Prabu 2009-04-20 05:54:24

+3 A:

You want table partitioning. This will automatically split the data between multiple tables, and will in general work much better than doing it by hand.

kquinn 2009-04-20 08:39:13

it is perfectly sensible to use separate table for historical records. It's much more problematic with separate database, as it's not simple to write cross-database queries
automatic updates - it's a tool for cronjob
you can use partial indexes for such things - they do wonderful job

depesz 2009-04-20 09:41:10

+2 A:

I'm working on near the exact same issue.
Table partitioning is definitely the way to go here. I would segment by more than year though, it would give you a greater degree of control. Just set up your partitions and then constrain them by months (or some other date). In your postgresql.conf you'll need to turn constraint_exclusion=on to really get the benefit. The additional benefit here is that you can only index the exact tables you really want to pull information from. If you're batch importing large amounts of data into this table, you may get slightly better results a Rule vs a Trigger and for partitioning, I find rules easier to maintain. But for smaller transactions, triggers are much faster. The postgresql manual has a great section on partitioning via inheritance.

rfusca 2009-04-20 19:21:42

Frankly, you should check your execution plans and try fixing your queries or indexing before taking more radical steps.

Indexing comes at very little cost (unless you do a lot of insertions) and your existing code will be faster (if you index properly) without modifying it.

Other measures such as partioning come after that...

alex 2009-04-20 21:40:34

ansaurus

tags:

views:

answers:

How to gain performance when maintaining historical and current data?

related questions