views:

45

answers:

1

I currently have a data solution in RDBMS. The load on the server will grow by 10x, and I do not believe it will scale.

I believe what I need is a data store that can provide fault tolerant, scalable and that can retrieve data extremely fast.

The Stats
    Records: 200 million
    Total Data Size (not including indexes):  381 GB
    New records per day: 200,000
    Queries per Sec:  5,000
    Query Result: 1 - 2000 records


Requirements
    Very fast reads
    Scalable 
    Fault tolerant
    Able to execute complex queries (conditions across many columns)
    Range Queries
    Distributed
    Partition – Is this required for 381 GB of data?
    Able to Reload from file
    In-Memory (not sure)

Not Required
    ACID - Transactions

The primary purpose of the data store is retrieve data very fast. The queries that will access this data will have conditions across many different columns (30 columns and probably many more). I hope this is enough info.

I have read about many different types of data stores that include NoSQL, In-Memory, Distributed Hashed, Key-Value, Information Retrieval Library, Document Store, Structured Storage, Distributed Database, Tabular and others. And then there are over 2 dozen products that implement these database types. This is a lot of stuff to digest and figure out which would provide the best solution.

It would be preferred that the solution run on Windows and is compatible with Microsoft .NET.

Base on the information above, does any one have any suggestions and why?

Thanks

A: 

So, what is your problem? I do not really see anything even nontrivial here.

  • Fast and scaling: Grab a database (sorry, complex queries, columns = database) and get some NICE SAN - a HP EVA is great. I have seen it, in a database, deliver 800mb of random IO reads per seconds..... using 190 SAS discs. Fast enough for you? Sorry, but THIS is scalability.

  • 400gb database size are not remarakble by any means.

    • Grab a decent server. Supermicro has one with space for 24 discs in 2 rack units height.
    • Grab a higher end SAS raid controller - Adaptec.
    • Plug in ReadSSD drives in a RAID 10 configuration. YOu will be surprised - you will saturate the IO bus faster than you can see "ouch". Scalability is there with 24 discs space. And an IO bus that can handle 1.2 Gigabyte per second.

Finally, get a pro to tune your database server(s). That simple. SQL Server is a lot more complicated to properly use than "ok, I just know how a select should look" (without really knmowing).

TomTom
Thanks for your input and you may be total correct, but the problem doesn’t seems to be the amount of data that is stored, but the 5,000 queries per sec against 200 million records. One possible problem with our current solution is that the data resides in a database that is over 2 TB in size. So there could be contentions there. There is the possibility of moving the table to its own database and a separate lung on the SAN. If anything this is a good process for me to learn about other databases.
BarDev
What is the problem here? Get a proper computer. It is easy to get a computer with 24 cores those days from AMD. Actually 48 cores is easy, too, from AMD. Plus replication or something like that and you are more than ok with that amount of queries. DO the queries have to be EXACT to that moment, or is a variation ok (which allows for replication to kick in).
TomTom