views:

367

answers:

7

Are there any guidelines/best practices for deciding what type of data should be stored in the database?

For ex., is it ok to use database to store

  1. Application logs
  2. Configuration details (like server IP addresses etc.)
  3. System information (e.g., names of shell scripts, scheduling information for batch jobs, batch jobs status etc.)

I have seen applications that use database for storing these. Is this acceptable? What are the pros and cons of such a design?

+6  A: 

To answer this question we have to understand what database storage provides that isn't available in say, flat file storage.

  1. security - You can store stuff and be sure that updates, deletes and views would be controlled
  2. audit - you can keep track of who made changes and when
  3. distributed servers - if you have multiple application servers accessing a single database, you avoid storing the same data in multiple places

If these are desirables for your data, it's a good idea to store them in the database.

alok
One thing you really need to think about when dealing with distributed/clustered applications. How do you deal with logging when the server doesn't have access to the database. You should be able to fail back to another place that is local to the server, say the event log on Windows. This will help you capture errors when it cannot connect to the database.
Redbeard 0x0A
+2  A: 

A small point: 99% of the time it's a terrible idea to store configuration in the DB. Config is too important to lose to a DB connection gone south: it needs to be 100% bullet proof.

annakata
application start-up config should be stored local to the application, but I like to store all other config in the DB (such as user settings). That way users can swap PCs and still get a consistent experience with your app.
Mitch Wheat
If your application depends on the database, then you've got an issue if it goes down anyway. Also DB config is typically cached in memory for a period, so if your config framework is built well it will handle the DB being offline and continue to use the old config.
Greg Beech
@mitchwheat: perhaps I misspoke, I consider user config to be just more user data unrelated to application config@gregbeech: If the DB goes down I expect graceful degradation, which is unlikely without access to config settings. The cache argument is true, but app variant. Still: what's the plus?
annakata
+3  A: 

We have stored everything in the database on the last few projects and it really helps when moving from development to production as there is very little to configure in the application itself.

Logging to the database can be useful (Log4j for e.g.) as it allows widespread access to the logs for the testers and analysts.

I guess it depends on your situation. Everything that is stored in the database adds a level of cemplexity to the system. It is easier to read a file than to access a database to get the same information from code. A probable rule if thumb would be to say that the larger the system, more of it should be stored in the database.

LenW
+4  A: 

Application logs

Although it often is a good idea to limit the data in the database to a specific time range (e.g. dump/archive/condense to stats everything that's older than 3 months), having the logs in database allows very fast and easy analysis of the data. Need to see what a specific user has done? "SELECT * FROM logs WHERE User = 'bla'". Need to find out why the system crashed at a specific time? "SELECT * FROM logs WHERE Timestamp BETWEEN failure - 1 hour AND failure + 5 minutes".

Configuration details (like server IP addresses etc.)

That depends on the configuration details. Some yes, some no. Everything that's valid for applications that run on more than one client (e.g. websites) and that is probably changing quite often (i.e. user settings) should go in the database. For more or less static options, I prefer to use a config file.

System information (e.g., names of shell scripts, scheduling information for batch jobs, batch jobs status etc.)

I guess that's almost the same as config details. If it changes: database. If it's static: config file. Shell scripts will usually be static. Scheduling information and status will change over time.

BlaM
+2  A: 

RE: Config data It might be a good idea to keep config data in the database to make it easier to edit it and keep track of the changes but then wright it out to a config file for the actual program to read.

  • Why should apache have to know anything about your database information to be able to get to its configuration?

  • Why should your FTP server stop working when the database is down?

RE: Application logs

As stated earlier, a database can make log analyzing a lot easier, but I urge you to consider the log-to-file-and-batch-import-later pattern.

Performance issues

Databases are great for getting random bits of data out and putting random bits of data in. Log data mostly is not written randomly but in a continues stream of data that is perfect for putting in a file one line after an other. You can't beat the performance of a flat file when it comes to writing the data. There's not a lot of things that can break with a flat file either. This also lets the database concentrate on doing the actual business work.

Then later on you can collect all the logged data from the file, parse it, do any required post processing (like looking up host names from IP addresses) and put it into a database table. You do this as often as you find necessary. For my website I really don't need to be able to view the visitor stats change from one minute to the other so I run the log batch at night. If you need up to date info you can just as well run the batch import every 60 seconds, but this will still be better than doing one extra INSERT statement for every actual business transaction (depending on how much you log, of course).

Security

  • How do you log a failed database connection if the database is your log engine?

  • How do you investigate why a system crashed if the database went down early during the events involved in the crash?

So I think you should consider when you need the log data in the database and why you need it in there.

Per Wiklander
A: 

Focus on ease of use and maintenance. The only logs I store in a database are put there by triggers that error out because that's easiest. But for everything else, searching and parsing text logs is faster and easier. If your app crashes, looking at a text config file is easier than looking in the db, especially for new maintainers. It's much, much easier for a new person to come along and see an app.properties file in the config/ directory than to know to look in a table in the database.

In addition, you can more easily store config files in source control if they're text files than if they're in the database. And this is massively important, believe me. You do not want to debug an app where you've lost the config file settings that caused the error. If you have a database crash or corruption, you could lose the logs and config settings which might make finding the problem impossible.

MattGrommes
+1  A: 

One thing that hasn't been mentioned yet is if you shove things like app configuration in the database, you can't put it under version control as easy.

For example, some CMS's like to shove HTML templates into the database instead of as files. I personally think this is poor design. You can't version any of the changes you make to the templates and worse, all you ever do is copy & paste from a real text editor into the wimpy text editor in the browser.

Bottom line? Ask yourself if this is something you want versioned. If yes, keep it out of the database. If no, sure, put it in the database.

Cory R. King