views:

174

answers:

6

I have a buddy who runs a web app for people listing cars for sale. There are a few thousand clients who use it, and each client has hundreds and sometimes thousands of rows in the database (some have been on for 5 years with hundreds of cars selling each month, and 10s of rows per sale (comments, messages, etc)). He has run this system in one SQL Server database in one physical server with like 20GB or RAM and a couple processors for the whole time, with no problems. Is this some sort of miracle?

Just like most programmers, I'm no DBA and just get by, thanks to ORMs, etc. Everywhere I look, people talk about having the need to shard or get a separate database server for big users of a web app. Why is this? Is it really that inefficient to have a large DB with lots or rows? Should I plan to use Cassandra or something, or can I rely on scaling up well with PostgreSQL?

+2  A: 

You should have no problem in SQL server, Oracle, or any modern relational or non-relational database. I have administered databases with 100's of millions of records and Terabytes of data.

Dustin Laine
+2  A: 

In my mind that's nothing. Having tens of millions of rows on multiple tables with database size exceeding 10 GB has not caused problems for MS SQL Server. Of course it is not too fast with that much data, but otherwise it works just fine.

And to answer the question, too big is so big it does cause problems. And when it starts causing problems depends on the table structure and your performance demands.

Carlos
+7  A: 

I personally don't think what you've described is that large of a database. The server (20 gigs of ram? ;)) sounds decent. It's more about usage and design. If the database is indexed and well designed, it can grow much, much larger on the current hardware.

Before doing any sort of switch, I'd simply look at archiving useless data and optimising queries if there's a fear of performance issues.

o6tech
I don't think it's anywhere near large. In terms of efficiency, decide on a measure or measures and do some sizing, it can be fun. The log might need truncating if it's been running for 5 years!
MikeAinOz
+2  A: 

Databases are extremely efficient at storing and retrieving relational data (i.e. data that is structured and has references to other data) - that's what they're designed to do. Honestly, 99% of the people spewing about key-value stores and Cassandra and whatnot have no clue what they're doing. A database server is just fine for storing large volumes of data, particularly if you're willing to put a bit of work into tuning it properly.

That said, there are use cases for Cassandra et. al. - if you have mostly unstructured key/value data or don't need consistency or want to shard for redundancy, it may be worth investigating.

Unless you're an extremely popular website, you probably can get by just fine with a decent database server - don't switch until you've determined why you need to switch. Switching is fine, just make sure you are switching because it serves your needs better, and not because it's the "cool web-scale thing to do"

Steven Schlansker
+2  A: 

The reason for sharding and separate db servers is that at some point it's going to be cheaper to use multiple cheaper machines than one expensive one. Hardware price doesn't scale linearly with performance and once you reach a certain point it'll be much cheaper to get twice as many machines as to get a machine that's twice as fast.

Davy8
+3  A: 

Typically you split components up across different servers so you can manage up time, resilience, and performance more easily.

It's certainly quite possible to have one monster machine which does it all, but then you may need another monster machine in case your motherboard dies, or your datacenter is unavailable.

By splitting a web site or application up, amongst different server's it's easier to get cheaper machines, and more of them. Thus you can build in resilience, and not have components which have similiar demands on hardware clashing.

It's also important to think about restore times for servers, and recovery plans.
What happens when your machine dies, can you replace it in the agreed upon time? Can you restore from backups in that time?

SQL Server or other enterprise class databases shouldn't have any problems with 10's or 100GB databases, as long as they not designed too badly. (We have a few machines with that capacity/use which aren't struggling at all.).

Bravax