views:

208

answers:

4

I am setting up a database that I anticipate will be quite large, used for calculations and data storage. It will be one table with maybe 10 fields, containing one primary key and two foreign keys to itself. I anticipate there will be about a billion records added daily.

Each record should be quite small, and I will primarily be doing inserts. With each insert I will need to do a simple update on one or two fields of a connected record. All queries should be relatively simple.

At what size will I start running into performance problems with sql-server? I've seen mention of vldb systems, but also heard they may be a real pain. Is there a threshold where I should start looking at that? Is there a better db than sql-server that is designed for this sort of thing?

+10  A: 

Unless you're talking large as in Google's index type of large, the Enterprise databases like SQL Server or Oracle will do just fine.

James Devlin over at Coding the Wheel summed it up nicely (though this is more of a comparison between free DB's like MySQL with Oracle/SQL Server

Nowadays I like to think of SQL Server and Oracle as the Death Stars of the relational database universe. Extremely powerful. Monolithic. Brilliant. Complex almost beyond the ability of a single human mind to understand. And a monumental waste of money except in those rare situations when you actually need to destroy a planet.

As far as performance goes, it all really depends on your indexing strategy. Inserts are really the bottleneck here, as the records need to be indexed as they come in, the more indexing you have, the longer inserts will take.

In the case of something like Google's index, read up on "Big Table", it's quit interesting how Google set it up to use clusters of servers to handle searches across enormous amounts of data in mere milliseconds.

Neil N
Big table information http://en.wikipedia.org/wiki/BigTable#External_links and http://labs.google.com/papers/bigtable.html
John K
Love that quote.
feihtthief
+13  A: 

When talking about transaction rates of over 10k/sec you shouldn't be asking advice on forums... This is close to TPC-C benchmark performance on 32 and 64 ways, which cost millions to tune up.

At what size will you be running into problems?

With a good data model and schema design, a properly tunned and with correct capacity planned server will not run into problems for 1 bil. records per day. The latest published SQL Server benchmarks are at about 1.2 mil tran/min. That is roughtly 16k transactions per second, at system priced at USD ~6 mil in 2005 (64 way Superdome). To achieve 10k tran/sec for your planned load you're not going to need a Superdome, but you are going to need a quite beefy system (at least 16 way probably) and specially a very very good I/O subsytem. When doing back of the envelope capacity planning one usualy considers about 1K tran/sec per HBA and 4 CPU cores to feed the HBA. And you're going to need quite a few database clients (application mid-tiers) just to feed 1 bil. records per day into the database. I'm not claiming that I did your capacity planning here, but I just wanted to give you a ballpark of what are we talking about. This is a multi-million dollars project, and something like this is not designed by asking advice on forums.

Remus Rusanu
+1 : Remus is definately right, to do this level of inserts and storage on any DBMS is requires specialist knowledge / consultancy and experience. Either engage with the SQL Cat team or check the MVP list / SQL Certified Masters list.
Andrew
Just to be clear: the TPC-C tran/sec measures the tpc-c workload 'tran', which is bigger than an insert and an update. Still, 10k simple trans/sec is a pretty big number.
Remus Rusanu
+3  A: 

The size of the database itself does not create performance problem. Practical problems in database size come from operational/maintenance issues.

For example:

  1. De-fragmenting and re-building indexes take too long.
  2. Backups take too long or take up too much space.
  3. Database restores cannot be performed quick enough in case of an outage.
  4. Future changes to the database tables take too long to apply.

I would recommend designing/building in some sort of partitioning from the start. It can be SQL Server partitioning, application partitioning (e.g. one table per month), archiving (e.g. to a different database).

I believe that these problems occur in any database product.

In addition, be sure to make allowances for transaction log file sizes.

Darryl Peterson
+3  A: 

It can be done, but given your hardware costs and plans get MS in to spec things out for you. It will be fraction of your HW costs.

Saying that, Paul Nielson blogged about 35k TPS (3 billion rows per day) 2 years ago. Comments worth reading too and reflect some of what Remus said

gbn