views:

473

answers:

14

Is there a general rule of thumb to follow when storing web application data to know what database backend should be used? Is the number of hits per day, number of rows of data, or other metrics that I should consider when choosing?

My initial idea is that the order for this would look something like the following (but not necessarily, which is why I'm asking the question).

  1. Flat Files
  2. BDB
  3. SQLite
  4. MySQL
  5. PostgreSQL
  6. SQL Server
  7. Oracle
A: 

Your application's utilization of the database is the most critical ones. Mainly what queries are used most often (SELECT, INSERT or UPDATE)?

Say if you use SQLite, it is gears for smaller application but for "web" application you might a bigger one like MySQL or SQL Server.

The way you write scripts and your web application platforms also matters. If you're developing on a Microsoft platform, then SQL Server is a better alternative.

chakrit
+8  A: 

It's not quite that easy. The only general rule of thumb is that you should look for another solution when the current one can't keep up anymore. That could include using different software (not necessarily in any globally fixed order), hardware or architecture.

You will probably get a lot more benefit out of caching data using something like memcached than switching to another random storage backend.

Jan Krüger
I agree with the first paragraph, but no the second - I would stay well clear of any memory caching set up around a DB. It will cause you more headaches than it will save, and at least for Oracle it's unneccessary, not to say harmful!
Galghamon
Memory caching can be very helpful in certain circumstances. Relatively static, but highly referenced data comes to mind. After all, how often do you really need to go to the database to get a list of state or country codes?
Chris Lively
A: 

I think your list is subjective but I will play your game.

Flat Files

BDB

SQLite

MySQL

PostgreSQL

SQL Server

Oracle

Teradata

Brian G
+5  A: 

If you think you are going to ever need one of the heavyweights (SqlServer, Oracle), you should start with one of those at the beginning. Data migrations are extremely difficult. In the long run it will cost you less to just start at the top and stay there.

Eric Z Beard
A: 

Typically, I go with what is commonly accepted by whichever framework I am using. So, if I'm doing .NET => SQL Server, Python (via Django or Pylons) => MySQL or SQLite.

I almost never use flat files though.

swilliams
A: 

There is more to choosing an RDBMS solution that just "back end horsepower". The ability to have commitment control, for example, so you can roll back a failed transaction is one. reason.

Unless you are in the megatransaction rate application, most database engines would be adequate - so it becomes a question of how much you want to pay for the software, whether it runs on the hardware and operating system environment you want, and what expertise you have in managing that software.

Ken Ray
+2  A: 

I think you're being overly specific in your rankings. You can pretty much start with flat files and the like for very small data sets, go up to something like DBM for slightly bigger ones that don't require SQL-like syntax, and go to some kind of SQL database after that.

But who wants to do all that rewriting? If the application will benefit from access to joins, stored procedures, triggers, foreign key validation, and the like--just use a SQL database regardless of the dataset size.

Which one should depend more on the client's existing installations and what DBA skills are available than on the amount of data you're holding.

In other words, the size of your database is far from the only consideration, and maybe not the most important one.

catfood
A: 

That progression sounds painful. If you're going to include MS products (especially the for-pay SQL Server) in there anywhere, you may as well use the whole stack, since you only have to pay for the last of these:

SQL Server Compact -> SQL Server Express -> SQL Server Enterprise (clustered).

If you target your app at SQL Server Compact initially, all your SQL code is guaranteed to scale up to the next version without modification. If you get bigger than SQL Server Enterprise, then congratulations. That's what they call a good problem to have.

Also: go back and check the SO podcasts. I believe they talked about this briefly.

Joel Coehoorn
A: 

This question depends on your situation really.

If you have control over the server you're deploying to and you can install whatever services you need, then the time to install a MySql or MSSQL Express server and code against an existing database framework VERSUS coding against flat file structure is not worth the effort of considering.

Sam
+1  A: 

There is no blanket answer to this, but ALMOST always, using flat files is not a good idea. You have to parse through them (i suppose) and they do not scale well. Starting with a proper database, like Oracle or SQL Server (or MySQL, Postgres if you are looking for free options) is a good idea. For very little overhead, you will save yourself a lot of effort and headache later on. They also allow you to structure your data in a non-stupid fashion, leaving you free to think of WHAT you will do with the data rather than HOW you will be getting it in/out.

Mostlyharmless
+1  A: 

It really depends on your data, and how you intend to use it. At one of my previous positions, we used Postgres due to the native geo-location and timezone extensions which existed because it allowed us to manage our data using polygonal datatypes. For us, we needed to do that, and we also wanted to use stored procedures, views and the like.

Now, another place I worked at used MySQL simply because the data was normalized, standard row by row data.

SQL Server, for a long time, had a 4gb database limit (see SQL Server 2000), but despite that limitation it remains a very stable platform for small to medium applications for which the old data is purged.

Now, from working with Oracle and SQL Server 05/08, all I can tell you is that if you want the creme of the crop for stability, scalability and flexibility, then these two are your best bet. For enterprise applications, I strongly recommend them (merely because that's what we use where I work now).

Other things to consider:

  • Language integration (ASP.NET session storage, role management, etc.)
  • Query types (Select, Update, Delete) [Although this is more of a schema design issue, not a DBMS issue)
  • Data storage requirements
Abyss Knight
A: 

What about FireBird? Where would that fit into that list?

A: 

And lets not forget the requirements that the "customer" of your solution must also have in place. If your writing a commercial application for a small companies, then Oracle might not be a good choice... but if your writing a customized solution for a large enterprise which must share data among multiple campuses, and has a good sized IT department then the decision of Oracle vs Sql Server would come down to what does the customer most likely already have deployed.

Data migration nowdays isn't that bad since we have those great tools from Embarcadero, so I would instead let the customer needs drive the decision.

skamradt
A: 

If you have the option SQL Server is a good choice from the word go, predominantly because you have access to solid procedures and functions and the database backup facilities are totally reliable. Wrapping up as much as your logic as you can inside the database itself (rather than in whatever language you are using) helps security and performance - indeed there's an good argument to be made for always using procedures for insert/update logic as these make you invulnerable to injection attacks.

If I have the choice the only time I'd consider MySQL in preference is with a large, fairly simple, database predominantly used for read access. This isn't to decry MySQL which has improved markedly of late and I happily use if I don't have the choice, but for more complex systems with update/insert activity MSSQL is generally the superior option.

Cruachan