views:

347

answers:

10

Okay, here's the scenario. I have a utility that processes tons of records, and enters information to the Database accordingly.

It works on these records in multi-threaded batches. Each such batch writes to the same log file for creating a workflow trace for each record. Potentially, we could be making close to a million log writes in a day.

Should this log be made into a database residing on another server? Considerations:

  1. The obvious disadvantage of multiple threads writing to the same log file is that the log messages are shuffled amongst each other. In the database, they can be grouped by batch id.
  2. Performance - which would slow down the batch processing more? writing to a local file or sending log data to a database on another server on the same network. Theoretically, the log file is faster, but is there a gotcha here?

Are there any optimizations that can be done on either approach?

Thanks.

A: 

I think it depends greatly on what you are doing with the log files afterwards.

Of the two operations writing to the log file will be faster - especially as you are suggesting writing to a database on another server.

However if you are then trying to process and search the log files on a regular basis then the best place to do this would be a database.

If you use a logging framework like log4net they often provide simple config file based ways of redirecting input to file or database.

samjudson
+1  A: 

Database - since you mentioned multiple threads. Synchronization as well as filtered retrieval are my reasons for my answer.
See if you have a performance problem before deciding to switch to files
"Knuth: Premature optimization is the root of all evil" I didn't get any further in that book... :)

Gishu
A: 

There are ways you can work around the limitations of file logging.

You can always start each log entry with a thread id of some kind, and grep out the individual thread ids. Or a different log file for each thread.

I've logged to database in the past, in a separate thread at a lower priority. I must say, queryability is very valuable when you're trying to figure out what went wrong.

Josh
+2  A: 

I second the other answers here, depends on what you are doing with the data.

We have two scenarios here:

  1. The majority of the logging is to a DB since admin users for the products we build need to be able to view them in their nice little app with all the bells and whistles.

  2. We log all of our diagnostics and debug info to file. We have no need for really "prettifying" it and TBH, we don't even often need it, so we just log and archive for the most part.

I would say if the user is doing anything with it, then log to DB, if its for you, then a file will probably suffice.

Rob Cooper
+1  A: 

How about logging to database-file, say a SQLite database? I think it can handle multi-threaded writes - although that may also have its own performance overheads.

+3  A: 

One thing that comes to mind is that you could have each thread writing to its own log file and then do a daily batch run to combine them.

If you are logging to database you probably need to do some tuning and optimization, especially if the DB will be across the network. At the least you will need to be reusing the DB connections.

Furthermore, do you have any specific needs to have the log in database? If all you need is a "grep " then I don't think you gain much by logging into database.

Rowan
+5  A: 

The interesting question, should you decide to log to the database, is where do you log database connection errors?

If I'm logging to a database, I always have a secondary log location (file, event log, etc) in case there are communication errors. It really does make it easier to diagnose issues later on.

ZombieSheep
+2  A: 

Not sure if it helps, but there's also a utility called Microsoft LogParser that you can supposedly use to parse text-based log files and use them as if they were a database. From the website:

Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart. Most software is designed to accomplish a limited number of specific tasks. Log Parser is different... the number of ways it can be used is limited only by the needs and imagination of the user. The world is your database with Log Parser.

I haven't used the program myself, but it seems quite interesting!

onnodb
notandy
+2  A: 

Or how about logging to a queue? That way you can switch out pollers whenever you like to log to different things. It makes things like rolling over and archiving log files very easy. It's also nice because you can add pollers that log to different things, for example:

  • a poller that looks for error messages and posts them to your FogBugz account
  • a poller that looks for access violations ('x tried to access /foo/y/bar.html') to a 'hacking attempts' file
  • etc.
James A. Rosen
A: 

I like Gaius' answer. Put all the log statements in a threadsafe queue and then process them from there. For DB you could batch them up, say 100 log statements in one batch and for file you could just stream them into the file as they come into the queue.

File or Db? As many others say; it depends on what you need the log file for.

noocyte