Storing 200 million variable length strings causes fragmentation/table bloat

views:

answers:

+2 Q:

Storing 200 million variable length strings causes fragmentation/table bloat

I am attempting to store over 200 million key value pairs. The value for over 50% of the keys will change over the course of the week, and about 5% of the rows will be permanently removed. Using traditional SQL solutions, this has proven to cause a large amount of fragmentation, causing table bloat (4x the original table size), and some performance issues. It takes considerable down time to resolve this fragmentation in SQL. We have used both reindexing and reorganizing techniques, but both have failed to keep up with the fragmentation. In addition I need to replicate this data to 2 other systems, which has also proven to be quite problematic.

The Table design is simple:

key NVARCHAR(50)

value VARCHAR(MAX)

We are considering using other technologies like MongoDB, but fear we will run into similar fragmentation issues.

Does anyone have any suggestions on how we can come at this problem in a different way that might limit the fragmentation?

This is perfect for MongoDb.

MongoDb also supports capped collections (which you may be able to use). You can have object in your DB which in a sense scroll out of view when you no longer need them. Which could lessen your administration of the database if things are changing weekly.

null 2010-08-30 23:56:39

Perhaps there's a way to create a versioning and partitioning mechanism even in MSSQL:

Versioning: If a value is changed, the old value is marked as is_active=False and the new value is inserted. Then, you bulk delete values every week, which will cause less fragmentation overall. You can use a view to filter only the is_actvie=True values.

Partitioning: I'm not sure what the best partitioning scheme would be best here. Since some values have a long lifespan, I think partitioning by time will not do the trick. Perhaps partitioning by key is better. This way, at least, you can try to defragment each partition separately, and the degradation is better contained.

OmerGertel 2010-09-06 05:30:01

ansaurus

tags:

views:

answers:

Storing 200 million variable length strings causes fragmentation/table bloat

related questions