views:

167

answers:

3

Is their a Grand Unified Theory of logging? Shall we develop one? Question (just to show this is not a discussion :), how can I improve on the following? (note that I live mainly in the embedded world, but non-embedded suggestions are also welcome)

How do you log, when do you log, what do you log, what do you do with log files?

How do you log - I generally have macros, #ifdef TESTING, sort of thing. They write to RAM and a low priority process writes them out when the system is idle (using UDP, since I do embedded systems)

When do you log - same as voting, early and often. At every (in)significant program event, I log at varying levels. Events received, transaction succeed/fail, data updated, etc

What do you log - Fatal/Error/Warning/Info/Debug/Trace is covered in http://stackoverflow.com/questions/2031163/when-to-use-log-level-warn-vs-error

What do you do with log files - 1) keep them (in CVS), both pass and fail 2) capture everything and filter later in case I can't repeat a problem. I have tools to filter the log by "level" (Fatal/Error/etc), process, file, etc. And to draw message sequence charts, dump data structures, draw histograms of memory usage - what am I missing?

Hmmm, binary or ascii log file format? Ascii is bulkier, but binary requires more processing. I have done both, currently I use ascii

Question - did I miss anything, and how can I improve on this?

+2  A: 

did I miss anything, and how can I improve on this?

  • Asynchronous logging.
  • Using multiple log files for the same process for different logging abstractions. e.g. the process' activities are logged in a normal log file. And the process' stats (periodic statistics that you might be interested in) are logged in a separate stats log file.

Hmmm, binary or ascii log file format? Ascii is bulkier, but binary requires more processing. I have done both, currently I use ascii

ASCII is good. More often than not, logs are meant to be used for debugging purposes. A human readable form eases and speeds this up. However, if your logs are used mostly to record information which is used later on for analysis and generation of reports (e.g. stats or latencies etc.) a binary format would be preferred. You can go one step ahead and use a custom format along with a db service which does index based sorting, where the index can be a tuple of time with the event type.

--

baskin
Baskin, what do mean by "Asynchronous logging"?Different files might work, but I tend to just gather everything into one big messy heap and post-process it, filtering as desired. Log as much as you can, you never know what you might need, and you many not be able to repeat a logging session. That's my motto.
Mawg
Asynchronous logging: for example in java I perform the actual io involved in logging in a background thread and not the application thread.
baskin
Different files help in debugging my application quicker. I can separate out the content which is helpful in debugging from that which is not.
baskin
+1  A: 

You could "instrument" your code in many different ways, everything from start-up/shut-down events to individual machine instruction execution (using a processor emulator). Of all the possibilities, what's worth doing? Don't just do it for the sake of completeness; have a specific goal in mind. A business case if you like, with a benefit you expect to receive. E.g.:

  • Insight into CPU task execution times/patterns to enable optimisation (if you need to improve performance).
  • Insight into other systems to resolve system integration issues (e.g. what messages is your VoIP box sending and receiving when it connects to a particular peer?)
  • Insight into the nature of errors (for field diagnostics)
  • Aid in development
  • Aid in validation testing

I imagine that there's no grand unified theory of logging, because what you do would depend on many details:

  • Quantity of data
  • Type of data
    • Events
    • Streamed audio/video
  • Available storage
    • Storage speed
    • Storage capacity
  • Available channels to extract data
    • Bandwidth
    • Cost
    • Availability
      • Internet connected 24×7
      • Site visit required
      • Need to unlock a rusty gate, climb a ladder onto a roof, to plug in a cable, after filling out OHS documentation
      • Need to wait until the Antarctic winter is over and the ice sheets thaw
  • Random access vs linear access (e.g. if you compress it, do you need to read from the start to decompress and access some random point?)
  • Need to survive error conditions
    • Watchdog reboots
    • Possible data corruption
      • Due to failing power supply
      • Due to unreliable storage media
      • Need to survive a plane crash

As for ASCII vs binary, I usually prefer to keep the logging simple, and put any nice presentation in a PC application that decodes the data. It's usually easier to create a user-friendly presentation in PC software (written in e.g. Python) rather than in the embedded system itself.

Craig McQueen
Craig, "Don't just do it for the sake of completeness" - semi (dis)-agree. I like to dump everything I can think of as I never know when I might need it, and may not be able to reproduce. But I do take your point about considering what might be useful.No grand unified theory - agreed, but there are some universal approaches, common to all, or most.post-process on the PC, I agree 100%
Mawg
+1  A: 

Here's my $0.02.

I only log when I'm having a problem and need to track down the source. Usually this has to do with a customer's environment, so I can't just attach the debugger. My solution is to enable the Telnet port and use that to print out statements as to where the program is and values of variables.

I do ASCII only because it's over telnet.

Another aspect of telnet is that it is pretty simple. It's a TCP port with text being thrown out. Very little processing other than the normal TCP headaches.

The log files are dumped as soon as I get them because I have not tried to capture and save a telnet session. I guess I could with WireShark, but I don't need a history of that session. I just need to find the problem and verify a fix.

Robert
Robert, Telnet is good - if the customer is happy with you poking around in "his" program.You say "I only log when I'm having a problem" and then talk of logging in the field. Don't you routinely log unit and integration tests? That's actually more what I was thinking of. But I do like Telnet. The alternative, in an embedded system, is to have a cyclically over-writable log file (which can lose some info), which can be dumped when the user brings the device to the service centre.
Mawg