views:

29

answers:

1

Hello,

So I'm currently working on a project that involves the collection and storing of some huge datasets (as far as what I'm used to working with). The data essentially consists of meta information, and then actual values (where the values are trended over time).

The meta information itself is relatively large, but nothing huge, I would probably say its going to grow the the 10-50 million row size over the next couple of years. This seems manageable to me, and a single beefy SQL Server should be enough to provide quick access to this data if it is decently indexed (and the data is very easy to index, with very defined boundaries)...

However, the trending data is a completely different story. Within a year, we are VERY easily going to be pulling in 40-50 million rows every day, and that could realistically double yearly for the next 3 or 4 years.

This trending data also has very defined boundaries that would split it into MUCH more manageable sized chunks. I'm hoping I can set up some sort of partitioning mechanism that would spread this data across multiple physical database nodes. The data is essentially all contained in a single table. I looked into SQL Server table partitioning, but couldn't find a way to spread the data over multiple servers.

My question is whether there is some "relatively simple" way of implementing table partitioning over multiple physical nodes. I've also spent some time looking at Sql Server PDW, but its difficult to find information online, and I don't want to pursue that until I've established that there is not simple way of implementing this sort of solution using features built into SQL Server.

Any advice would be greatly appreciated...

+1  A: 

I'm no expert on this but I believe what you may be looking for is database 'sharding'. There's an interesting analysis of the problems and benefits of sharding here.

Ultimately, implementation of a 'sharded' design is likely to be very costly but if your data is going to be unmanageable in a single database then this could be a good solution.

There is also a small amount of information on the Wikipedia page which includes a list of software which supports shards (e.g. the Hibernate ORM)

Dolbz
Thanks for the reply, not quite what I was hoping for, but I'll give you a +1 for the good reading... I'm thinking I may have to look into a distributed key value store or something, just for the trending tables, should be much easier to scale out than SQL Server
LorenVS

related questions