tags:

views:

91

answers:

5

When using a log facility, what are the common "rules of thumb"? E.g.

  • Rate limit message X to Y messages per unit of time Z?
  • Wait for a recent success message of type T before logging a "new" failure message of the same type?
+1  A: 

If you have to discard messages, discard the unimportant ones.

If you're displaying an important message, don't bury it in a flood of unimportant ones.

Make it very cheap to not display a message when that level of messaging is disabled/not needed.

Make it possible to discover the current state of the system without having to read every old message.

Manage the size of your log files (e.g. several files instead of one file of infinite size), beware filling the disk.

Consider using a standard output format/medium (for example SNMP, <small>or the NT event log</small>), which you can view and manage using fully-featured 3rd-party tools.

ChrisW
+1  A: 
  • Print as much context on failure as you can. Including fullest error message possible. Include exact location in the program, or in the workflow (e.g. "error processing line 10029 of input file" vs. "error processing input file")

  • When DB query fails, consider printing the query text nicely formatted (e.g. Sybase errors usually contain mangles partial query only)

  • Use log facility that has nice formatting, including ability to tag INFO/WARN/ERROR (or level of log message), for easy grepping

  • Use log facility that has decent timestamps ability.

  • As you noted, consider volume. Throttle or bundle messages.

DVK
This is fine for developer-visible logs; customer-visible logs should be more demure. In particular, filenames and line numbers should not be customer-visible. At least, not in our (embedded, security-sensitive) environment.
+1  A: 

We rate-limit duplicate messages. We use a syslog-like category & priority hierarchy and, by default, only log messages that indicate warnings and above.

If things go south, we can crank up the logging for that component until we've resolved it.

+1  A: 

I agree with Jonathon, more context would be helpful. Some things to think about are:

  1. Will you allow the event to happen if the log of the event fails? If yes, you have many more options, if no, then you need to make your log of event part of the transactional block when you persist the event
  2. Will the the logs be cleaned, or are they persisting for the life of the system? If yes, once again you have many options. If no, then you'll want to put them in a database.
  3. How much data will be in the logs? Consider indexing and/or partitioning the table. Also think about how your going to access the logs. Log event's on a parent object instead of each child. For example an Accounting Journal, with many subjourmals, that list transactions. Instead of logging on the transaction ID that a transaction has been approved, log on the Journal that the transaction was approved.

These are just a few questions to think about.

Jay
A: 
  • Don't log passwords!
  • If it is posible, try to avoid repetition of the same message if the fault is logged a lot of times until the system recovers from it. Just log something like "error of type x, occurred n times from timestamp1 to timestamp2".
  • Don't keep your logs forever, implement rotation policies (could be based on file sizes or time periods).
  • Use diferent levels of log for different situations in a consistent way.
JuanZe