I have an interesting challenge of building a database that imports data from about 500 different sources.
Each source has their own schema, and many are very very different. However, they all are data about a common entity.
My first thought is a typical entity / Attribute / Value schema, however after converting the denormalized import from one source (550k rows) into AEV, I end up with 36 million rows in the Attribute_Value table. With proper indexes, this is still very fast, but this is just one out of 500 import sources in so far.
I don't think this will scale, however it does make for very nice logical partitioning, we don't need to join across import sources, so we could build out (theoretically) 50 or so separate databases.
I'm looking for people who have worked with massive datasources, and their experience with how to handle things when your row count is in the hundreds of millions.