tags:

views:

112

answers:

2

I have an interesting challenge of building a database that imports data from about 500 different sources.

Each source has their own schema, and many are very very different. However, they all are data about a common entity.

My first thought is a typical entity / Attribute / Value schema, however after converting the denormalized import from one source (550k rows) into AEV, I end up with 36 million rows in the Attribute_Value table. With proper indexes, this is still very fast, but this is just one out of 500 import sources in so far.

I don't think this will scale, however it does make for very nice logical partitioning, we don't need to join across import sources, so we could build out (theoretically) 50 or so separate databases.

I'm looking for people who have worked with massive datasources, and their experience with how to handle things when your row count is in the hundreds of millions.

+2  A: 

Have you considered OLAP solutions? They are probably designed for situations like yours. Massive amount of data to read and analyze.

Mehrdad Afshari
Yup, it's what they're for. And there are lots of tools and a large and growing body of wisdom.
le dorfier
+2  A: 

I have billion+ row tables, the number of rows is not as critical as the fragmentation level and the width of the table itself, the wider the table the less you can fit on a page

beside OLAP/SSAS Have you looked at using partitioned functions (new in sql server 2005)

You could also take advantage of page and row level compression (new in sql server 2008) this will help you store more data into RAM, I did my own testing with compression, check out this link to see how it compared to no compression A Quick Look At Compression In SQL 2008

SQLMenace