views:

152

answers:

3

I have a project involving a web voting system. The current values and related data is stored in several tables. Historical data will be an important aspect of this project so I've also created Audit Tables to which current data will be moved to on a regular basis.

I find this strategy highly inefficient. Even if I only archive data on a daily basis, the number of rows will become huge even if only 1 or 2 users make updates on a given day.

The next alternative I can think of is only storing entries that have changed. This will mean having to build logic to automatically create a view of a given day. This means less stored rows, but considerable complexity.

My final idea is a bit less conventional. Since the historical data will be for reporting purposes, there's no need for web users to have quick access. I'm thinking that my db could have no historical data in it. DB only represents current state. Then, daily, the entire db could be loaded into objects (number of users/data is relatively low) and then serialized to something like XML or JSON. These files could be diffed with the previous day and stored. In fact, SVN could do this for me. When I want the data for a given past day, the system has to retrieve the version for that day and deserialize into objects. This is obviously a costly operation but performance is not so much a concern here. I'm considering using LINQ for this which I think would simplify things. The serialization procedure would have to be pretty organized for the diff to work well.

Which approach would you take?

Thanks

A: 

Have you considered using a real version control system rather than trying to shoehorn a database in its place? I myself am quite partial to git, but there are many options. They all have good support for differences between versions, and they tend to be well optimised for this kind of workload.

1800 INFORMATION
So you're suggesting I serialize/deserialize and commit data to git on the fly?
Mr Grieves
I don't really understand your problem statement, but git is designed with performance as one of the primary design goals - it is likely to be faster than anything else you might try, if it is a good match for your application
1800 INFORMATION
A: 
carrier
Voting is an ongoing thing. Votes can be removed by users or returned to them by administrators. Other details about the items users vote on will also change with time. There's 4 tables and a total of about 20 rows of data to archive.
Mr Grieves
My question was formulated in a somewhat general fashion because I find this to be a pretty general problem: What's the most elegant/intelligent way to mix version control and row operations.
Mr Grieves
+1  A: 

If you're basically wondering how revisions of data are stored in relational databases, then I would look into how wikis do it.

Wikis are all about keeping detailed revision history. They use simple relational databases for storage.

Consider Wikipedia's database schema.

carrier