views:

151

answers:

3

There are huge numbers of threads running in parallel continuously (let's assume this continuous part)). All the threads want to log some application data, basically a set of values.

  1. What would be the best approach to log this data? single/multiple file?
  2. What would be the best approach to make backup of this log?
  3. What would be the approach to read data from backup file and convert it into something useful?

Several threads like this and this suggest log4net and log4j but I want to know the actual process? Also how multiple threads write to same log file? Is file level lock required for each thread? How does all this work?

Any pointer towards understanding all the details would be appreciated.

Thanks.

+1  A: 

Regarding point 1, I usually log everything (feature-related) to the same file but the log line always includes some context information that allows me to track (via grep or something else) the flow of the context/request.

Example (a scenario with calls):

DEBUG|CallID#12: Establishing new AUDIO call from AA to BB
DEBUG|CallID#34: Call accepted by ZZ at ...
DEBUG|CallID#99: Call terminated by callee (SS)

This way it's if someone asks "what happened to call from AA to BB at 12:34 today?" I just grep either AA to BB (or the time it happened) and then, once I get the call id, getting the full details of the call is just a matter of grepping again with the id.

Other stuff like chat, presence, etc would go in its own file (wouldn't make much sense to mix this info all in a single monolithic file).

If you want per-thread (instead of per action/request) just log the name of the thread that's performing the action.

Regarding point 2, daily rotation with log4j.

Not sure I understood point 3... Maybe you mean parse a log file to retrieve some patterns? Any tool that supports regex will do the trick (grep being the most handy).

brunodecarvalho
http://logging.apache.org/log4j/1.2/manual.html Scroll down to "Performance". Should give you some hints.
brunodecarvalho
+1  A: 

As the comments above already tell, logging frameworks exist precisely to free you from worrying about such low-level details. Log4J or its successors like LogBack can handle logging by multiple threads safely and effectively. You just tell the logging framework what to log and where, and it all works (usually :-)

For logging thread-specific data, you may consider using a Diagnostic Context. This earlier answer of mine explains this with an example for Log4J. In Logback, it has been renamed to Mapped Diagnostic Context.

As for backups and post-processing, all depends on your actual goals. Typically simple scripts or a single command like gzip and grep is all you need. It is hard to tell more without concrete information.

Péter Török
+5  A: 

A library like log4j will be able to be configured for your needs.

  1. Splitting into too many files will make it difficult to debug some issues, but having one monolithic file leaves a soup of mixed processes. I would have a file for each atomic process, that is, a mail manager might use its own log file. Extra debug information for jdbc might have its own log file, but errors and major events would still be reported in the main application log.

  2. Major logging libraries support log splitting and rotation. For a well used web application, I prefer to have a log file made for each day, and also split over a certain size. You can build a cron to zip older logs and depending on the application, you may want to back them up for a few months or indefinitely.

  3. As far as debugging usefulness, you can grep for certain strings such as "Exception" to report on. If you are looking for statistics, you should make a log for that specific purpose in addition to your process log.

Logs can be synchronous or asynchronous, and the latter is usually best for performance. In general, a queue of messages is built and then written by a separate thread. So multiple threads can write to that one queue or buffer in memory and one thread will lock and write the file. Its pretty much in the background and you don't have to think about it unless you are writing a huge amount of data.

Peter DeWeese
+1 for mentioning asynchronous logs
Tadeusz Kopec