tags:

views:

161

answers:

5
A: 

"Handle up to ~100,000 insert commands a second" - is this peak, or normal operation? If normal operation, your 'millions of records stored' is likely to be billions...

With questions like this, I think it is useful to understand the business 'problem' further - as these are non-trivial requirements! The question arises whether the problem justifies this 'brute force' approach, or if there alternative ways of looking at it to achieve the same goal.

If it is needed, then you can consider if there are methods of aggregating / transforming data (bulk loading of data / discarding multiple updates to the same record / loading to multiuple databases and then aggregating downstream as a combined set of ETLs perhaps) to make it easier to manage this volume.

Kris C
A: 

The first thing I would worry about is your disk layout, you are having a mixed workload (OLTP and OLAP) so it is extremely important that your disks are sized and placed correctly in order to achieve this throughput, if your IO sub system can't handle the load it then it doesn't matter what DB you will be using

In addition perhaps those 100,000 inserts a second can be bulk loaded, btw 100,000 rows a second amounts to 72,000,000 rows in just 12 hours so perhaps you want to store billions of rows?

SQLMenace
Doesn't really address the question.
Russ
A: 

You probably can't handle 100k individual insert operations per second, you will certainly need to batch them into a more managable number.

A single thread wouldn't be able to do that many commands anyway, so I would expect there to be 100-1000 threads doing those inserts.

Depending on your app you will probably need some kind of high availability as well. Unless you're doing something like a scientific app.

My advice is to hire somebody who has a credible answer for you - ideally someone who's done it before - if you don't know, you're not going to be able to develop the app. Hire a senior developer who can answer this question. Ask them in their interview if you like.

MarkR
+2  A: 

The answer depeneds on asking additional questions, such as how much you want to spend, what OS you are using, and what expertise you have in-house.

Database that I know of that can handle such a massive scale include: DB2, Oracle, Teradata, and SQL Server. MySQL may also be an option, though I'm not sure of its performance capabilities.

There are others, I'm sure, designed for handling data on the massive scale you are suggesting, and you may need to look into those, as well.

So, if your OS is not Windows, you can exclude SQL Server.

If you are going on the cheap, MySQL may be the option.

DB2 and Oracle are both mature database systems. If your system is mainframe (IBM 370), I'd recommend DB2, but for Unix-based either may be an option.

I don't know much about Teradata, but I know it is specifically designed for massive amounts of data, so may be closer to what you are looking for.

A more complete list of choices can be found here: http://en.wikipedia.org/wiki/List_of_relational_database_management_systems

A decent comparason of database here: http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

100000+ inserts a second is a huge number, no matter what you choose, you are looking at spending a fortune on hardware to handle this.

Russ
@Russ: Out of DB2 and Oracle, why do you suggest DB2 for an IBM mainframe?
Amoeba
A: 

This is not a question about what DB to choose, it is a question about your skills and experience.

If you think that it is possible with one physical machine - you are on the wrong way. If you know that several machines should be used - then why you ask about DB? DB is not as important as a way you are working with it.

Start from write-only DB on one server and scale it vertically for now. Use several read-only servers and scale them horizontally (here document database can be chosen almost always safely). CQRS concept is something that will ask on your forthcoming questions.

Rationalle