tags:

views:

1090

answers:

11

I've been working on a server and I'm starting to implement logging. However, I'm not sure whether I should use the db for logging, or just a plaintext file.

I'm planning on logging some basic information for every request (what type of request, ip address of request, session tracking). For some requests there will be extended information present (details on what type of request was made), and if there are any errors I will log those, too.

On the one hand, putting the logs into the db means I could run queries on the logged data. On the other hand, I'm not sure if this would be putting unnecessary strain on the db. Of course, I could also use both the db and a log file for logging. What are people's thoughts on proper logging?

(If it makes a difference, I'm using mod_python on an Apache server with a MySQL db. So I'd either be using the logging library or just creating some logging tables in the db.)

+1  A: 

We've always logged data to a separate database.

This lets us query without impacting the application database. It also simplifies things if we realize that we need to disable logging or change the amount of what we log.

But most modern logging libraries support embedding the logging into your application and choosing the destination by configuration - file, database, whatever.

Logger gives you lots of ways to manage your logging, and although the default package doesn't have a database logger, it wouldn't be hard to write such an event handler.

lavinio
+1  A: 

Mix file.log + db would be the best. Log into db information that you eventually might need to analyse, for example average number of users per day etc. And use file.log to store some debug information.

zdmytriv
+1  A: 

If you decide on a log file format that is parseable, then you can log to a file and then have an external process (perhaps run by cron) that processes your log files and inserts the details into your database. This can be arranged to happen at a time when your application and database load is low.

I always worry about what happens if the database becomes unavailable: would this prevent your application from running, or degrade it in any way? Logging to the filesystem avoids having to deal with that issue, but you'd still need to worry about disks filling up and log file rotation.

mhawke
+1  A: 

Log to the DB only if it generates revenue.

For example, for one site, we logged all advertisements placed in a web site to a database. It generated revenue. No reason to be parsing log files for something that important.

Everything else goes to the file system.

Log to the file system for debugging. It's generally private stuff. Implementation details. Not to be shared.

Apache logs a mountain of stuff to the filesystem. Do not duplicate this.

Access control logs go to the file system. You'll rarely want to look at these in detail.

User activity may have to be summarized into a database. This is marketing and usability information that you'll want to study to improve your site. However, detailed activity information is too voluminous to record in the database. Put it on the file system and digest it to a marketing/product improvement/usability analysis database.

S.Lott
+7  A: 

First, use a logging library like SLF4J/Logback that allows you to make this decision dynamically. Then you can tweak a configuration file and route some or all of your log messages to each of several different destinations.

Be very careful before logging to your application database, you can easily overwhelm it if you're logging a lot of stuff and volume starts to get high. And if your application is running close to full capacity or in a failure mode, the log messages may be inaccessible and you'll be flying blind. Probably the only messages that should go to your application database are high-level application-oriented events (a type of application data).

It's much better to "log to the file system" (which for a large production environment includes logging to a multicast address read by redundant log aggregation servers).

Log files can be read into special analytics databases where you could use eg, Hadoop to do map/reduce analyses of log data.

Jim Ferrans
Log to a syslog server like splunk, it supports many log formats and you can make the database log there as well as the http server, then you can cross reference from a nice usable gui.Make sure you are using async logging (log4j and I bet many others have that kind of appender).
feniix
SLF4J/Logback are java based solutions. Python an extensive logging module built-in.
John Mee
@John: That's wonderful, Java's logging is quite fragmented between three main contenders (java.util.logging, Log4J, Jakarta Commons Logging). SLF4J is an attempt to integrate all of these coherently. The Python team was very wise to do this.
Jim Ferrans
A: 

Just in case you consider to tweak the standard Python logger to log to a database, this recipe might give you a head start: Logging to a Jabber account.

wr
A: 

I would primarily use filesystem logging, just as most other answers recommend. With Python's logging package, you can easily create a database handler, by adapting the suggestion made here. You can also create a custom Filter instance and attach it to your database handler - this will allow you to determine at run-time exactly which events you actually log to the database. In line with other answers, I would say it's only really worth logging some types of event to the database for later analysis.

I would concur with the recommendation to log to a separate database (on a separate server) if your main application is high-throughput.

Vinay Sajip
A: 

The type of logging depends upon what you're going to do with the data and how you are going to do it. Logging to db is advantageous if you are going to build a reporting system based upon this log db. Else you can log things in a specific format which you can parse later if you want to utilize the data for some analysis. For example, from the file log you can parse only the required information and generate CSVs as and when required. If you're planning to use a db logger, as already suggested, have it separately from your application db.

Secondly, you can consider having the logger independent of your main application. Either spawn a thread which does the logging, or run a logger at specific port/socket and pass on the log messages to it, or collect all logging messages together and flush it off into the log at the end of each cycle.

Technofreak
A: 

We do both.

We log operational information/progress/etc. to the logfile. Standard logfile stuff.

In the database, we log statuses of operations. E.g. each item that's processed, so we can do queries on throughput/elapsed time/etc. This data is particularly useful when trending and detecting anomalies (system is "too quiet" etc.) that are potentially indicative of other issues.

Joe
A: 

Indeed it seems important that you can later switch between DB/File logging. Database logging seems to be much slower than plain text file logging which may become important with high log traffic. I've made a library (which can act standalone or as a handler) when I had the same requirement. It logs into database and/or files, and allows to archive critical messages (and the archive may, for example, be a database while everything goes into text files.) It may save you from coding another one from scratch ... See: The rrlog library

Rabe
A: 

It looks like many of you are logging some of the events to a database. I am doing the same, but its adding a bit of delay. Do any of you log to database through a message queue? If so, what do you use for queuing and what is your logging architecture like? I am using Java/J2EE.

Langali