views:

91

answers:

4

I have an application that is scheduled and get some data from a another database table and dumps in my main applications database table,the number of records in this table increases daily and I assume it will grow daily as it is transactions event data that occur.This data is used for processing by the main application which takes each record and does the needed analysis and marks each record as processed.

What kind of solution can I provide so that I can keep the size of the database down in the future?

How would you go about it in this situation?

From my observation of a few enterprise applications,one provides an option that the user can archive records 'older than 60 day' etc... to a text file.Well I could provide an option to archive the processed records to a text file and delete the records from the database,the text file could be later imported if necessary? Is this a solution?

A: 

IMHO, it depends of how likely is that the user will need to analyse past data. If it's likely, just create good indexes and keep all the data in your main database.

If it's not then drop it to an TXT. The time where it happens must be configurable of course.

tekBlues
A: 

What kind of past data reporting needs does your company have? Dropping archived data into a text file is all well and good, assuming you don't need to be able to report off of that data in the future. However, having it in a text file means you have to have a manual process to import it on demand into a database when it is needed.

A better option would be to move archival data off into a data warehouse database which is not used for transaction processing (OLTP), and instead is used as the foundation of an analytical processing database (OLAP). When the time comes to report off of this archived data, its ready to go. If you are careful about how you structure data in this archival database, it should be very easy to aggregate all the data into an OLAP Cube, which makes reporting off of that data much faster and more flexible.

But again...depends on whether you report off the data or not, and how far back in time that reporting might go.

jrista
+1  A: 

If you need to occasionally access that older data then building a process to archive it to text and then to load back from text is probably not a great solution. Hard drives are cheap.

You could aggregate the older data. For example if the transaction data is at the millisecond grain now but when you report on older data you get it by the day then consider aggregating the data to "daily" as your archiving process. You may be able to collapse hundreds of thousands of rows into a just a few for each day.

Also consider a good partitioning scheme where you can keep the most recent transactions on one set of disks and the archived data to other disks, hopefully in a process where you can easily add new disks and create tables to those disks.

esabine
A: 

It does depend on how much analysis will be done on past data, but there is a way to keep it all in the the database without performance becoming a problem.

The solution that comes to mind is to partition the tables in question. My company has a database table that has data partitioned by month, each of which contain about 20 million rows. The partitioning makes using this data far more practical than if it were stored in a single table. Now the only real constraint is disk space, which is a non-issue given how cheap it is these days.

I know, however, that some databases do not support partitioning. If this is the case, I suppose storing the data in a delimited file would be an appropriate solution.

Rhinosaurus