ansaurus

Question

Answer 1

+2 A:

Splitting tables with a high input volumes into a write-optimized "recent" table and a read optimized "archive" table is generally a pretty good optimization. It does increase complexity, so you don't want to do it where it's not needed, but it's reasonable if you're sure that the table in question is going to get tons of data.

Ryan Brunner 2009-04-24 19:39:07

Answer 2

+1 A:

I wouldn't recommend the approach you've taken. If the intent was to improve application performance, it would have been more appropriate to collect performance metrics first. If a trend indicated decreasing performance as the amount of data grew, then it would be clear that some database change was appropriate.

Assuming your primary concern is the performance of selects against a large table, steps like applying good indexes and replacing "select *" with just the columns you want might be a better place to start than duplicating data across multiple tables. If your queries had a substantial number of joins, I could see that having a negative impact on your performance. In that case, creating an additional table that eliminated the need for joins in your queries would be a good optimization.

Scott A. Lawrence 2009-04-24 19:52:03

Answer 3

+1 A:

You didn't mention what database you are using but I can think of a couple of possible quick optimizations. How many gigabytes are we talking about?

1) Calculating the max(timestamp) can be expensive given a large number of rows. You probably already know what this value is, store it in a different table or a config file or something. This will probably be your biggest optimzation.

2) Add another column to flag recent updates. When you start your update SET recent = false WHERE recent = true, write all your records with recent = true. You may be able to limit the size of your index by adding a where condition to it CREATE INDEX foo_index on "TB_TABLE" (recent) WHERE recent = true;

3) Make sure that your db server is properly optimized. Make sure your key and sort buffers are appropriately sized for your dataset. Most open source database come pre-tuned for a developer's workstation, not a production workload.

4) Reconsider your schema. Are you sure you need all your records? Are you recording all the data an not just the data that's changed? I've made good use of two timestamps in this situation, one timestamp for the last load and one timestamp for the last change.

Trey 2009-04-24 19:57:07

5gb/month. sql server 2005

Jader Dias 2009-04-24 23:50:45

Answer 4

+1 A:

I wonder if table partitioning would be helpful. I haven't personally used it so cannot speak from experience, but this sounds like the appropriate situation in which to use it.

Nathan Koop 2009-04-24 23:43:31

never heard of it. I'll google it. thanks

Jader Dias 2009-04-24 23:45:07

ansaurus

tags:

views:

answers:

Database tables duplication guidelines.

related questions