views:

420

answers:

11

Given you have these levels with increasing severity at hand from your favorite logging tool:

TRACE < DEBUG < INFO < WARN < ERROR < FATAL

How do you decide, when to use WARN vs ERROR? I can never decide what is appropriate. Do you have a good heuristic to base that decision on?

+2  A: 

Warnings you can recover from. Errors you can't. That's my heuristic, others may have other ideas.

For example, let's say you enter a last name from a European person that had an umlaut. Your code may be English only (though it probably shouldn't be in this day and age) and could warn that all "funny" characters had been converted to regular English characters.

Contrast that with trying to write that information to the database and getting back a network down message for 60 seconds straight. That's more of an error than a warning.

paxdiablo
+1 for stating the simple heuristic that I've worked with for ages! (-:
Rob Wells
+4  A: 

If you can recover from the problem then it's a warning. If it prevents continuing execution then it's an error.

Ignacio Vazquez-Abrams
But then, what is the difference between error and fatal error ?
An error is something that you do (e.g. read a non-existent file), a fatal error is something that is done to you (e.g. run out of memory).
Ignacio Vazquez-Abrams
A: 

I've built systems before that use the following:

  1. ERROR - means something is seriously wrong and that particular thread/process/sequence can't carry on. Some user/admin intervention is required
  2. WARNING - something is not right, but the process can carry on as before (e.g. one job in a set of 100 has failed, but the remainder can be processed)

In the systems I've built admins were under instruction to react to ERRORs. On the other hand we would watch for WARNINGS and determine for each case whether any system changes, reconfigurations etc. were required.

Brian Agnew
+12  A: 

would you want the message to get a sysadmin out of bed in the middle of the night

  • yes -> error
  • no -> warn
pm100
Best description of the situation ever.
Crast
Except most people don't care if they get people out of bed at night. We've had customers raise severity-1 dockets (meant for 100% outage, i.e., national) because one site couldn't do their work (their reasoning was that it's 100% of that site). We've since "educated" them on that score.
paxdiablo
+2  A: 

An error is something that is wrong, plain wrong, no way around it, it needs to be fixed.

A warning is a sign of a pattern that might be wrong, but then also might not be.

Having said that, I cannot come up with a good example of a warning that isn't also an error. What I mean by that is that if you go to the trouble of logging a warning, you might as well fix the underlying issue.

However, things like "sql execution takes too long" might be a warning, while "sql execution deadlocks" is an error, so perhaps there's some cases after all.

Lasse V. Karlsen
A good example of a warning is that in MySQL, by default, if you try to insert more characters in a `varchar` than it is defined for, it warns you that the value was truncated, but still inserts it. But one person's warning may be another's error: In my case, this is an error; it means I made an error in my validation code by defining a length incongruous with the database. And I wouldn't be terribly surprised if another DB engine considered this an error, and I'd have no real right to be indignant, after all, it is erroneous.
Crast
I too would consider that an error. In some cases, the contents is "text" (not in the datatype meaning), which means that *perhaps* it is OK to truncate it. In another case it's a code, where chopping bits off it will corrupt it or change its meaning, which is not OK. In my opinion, it's not up to the software to try to guess what I meant. If I try to force a 200 character string into a column that only takes 150 characters, that's a problem I'd like to know about. I do, however, like the distinction made by others here, that if you can recover, it's a warning, but then... do you need to log?
Lasse V. Karlsen
+12  A: 

I generally subscribe to the following convention:

  • Trace - Only when I would be "tracing" the code and trying to find one part of a function specifically
  • Debug - Information that is diagnostically helpful to people more than just developers (IT, sysadmins, etc)
  • Info - Generally useful information to log (service start/stop, configuration assumptions, etc). Info I want to always have available but usually dont care about under normal circumstances. This is my out-of-the-box config level
  • Warn - Anything that can potentially cause application oddities, but for which I am automatically recoverring (such as switching from a primary to backup server, retrying an operation, missing secondary data, etc)
  • Error - Any error which is fatal to the operation but not the service or application (cant open a required file, missing data, etc). These errors will force user (administrator, or direct user) intervention. These are usually reserved (in my apps) for incorrect connection strings, missing services, etc.
  • Fatal - Any error that is forcing a shutdown of the service or application to prevent data loss (or further data loss). I reserve these only for the most heinous errors and situations where there is guaranteed to have been data corruption or loss.
GrayWizardx
A: 

I've always considered warning the first log level that for sure means there is a problem (for example, perhaps a config file isn't where it should be and we're going to have to run with default settings). An error implies, to me, something that means the main goal of the software is now impossible and we're going to try to shut down cleanly.

dicroce
A: 

As others have said, errors are problems; warnings are potential problems.

In development, I frequently use warnings where I might put the equivalent of an assertion failure but the application can continue working; this enables me to find out if that case ever actually happens, or if it's my imagination.

But yes, it gets down to the recoverabilty and actuality aspects. If you can recover, it's probably a warning; if it causes something to actually fail, it's an error.

Michael E
A: 

G'day,

As a corollary to this question, communicate your interpretations of the log levels and make sure that all people on a project are aligned in their interpretation of the levels.

It's painful to see a vast variety of log messages where the severities and the selected log levels are inconsistent.

Provide examples if possible of the different logging levels. And be consistent in the info to be logged in a message.

HTH

Rob Wells
A: 

I totally agree with the others, and think that GrayWizardx said it best.

All that I can add is that these levels generally correspond to their dictionary definitions, so it can't be that hard. If in doubt, treat it like a puzzle. For your particular project, think of everything that you might want to log.

Now, can you figure out what might be fatal? You know what fatale means, don't you? So, which items on your list are fatal.

Ok, that's fatal dealt with, now let's look at errors ... rinse and repeat.

Below Fatal, or maybe Error, I would suggest that more information is always better than less, so err "upwards". Not sure if it's Info or Warning? Then make it a warning.

I do think that Fatal and error ought to be clear to all of us. The others might be fuzzier, but it is arguably less vital to get them right.

Here are some examples:

Fatal - can't allocate memory, database, etc - can't continue Error - no reply to message, transaction aborted, can't save file, etc Warning - resource allocation reaches X% (say 80%) - that is a sign that you might want to re-dimension your Info - user logged in/out, new transaction, file crated, new d/b field, or field deleted Debug - dump of internal data structure, Anything Trace level with file name & line number Trace - action succeeded/failed, d/b updated

Mawg
A: 

Btw, I am a great fan of capturing everything and filtering the information later.

What would happen if you were capturing at Warning level and want some Debug info related to the warning, but were unable to recreate the warning?

Capture everything and filter later!

This holds true even for embedded software unless you find that your processor can't keep up, in which case you might want to re-design your tracing to make it more efficient, or the tracing is interfering with timing (you might consider debugging on a more powerful processor, but that opens up a whole nother can of worms).

Capture everything and filter later!!

(btw, capture everything is also good because it lets you develop tools to do more than just show debug trace (I draw Message Sequence Charts from mine, and histograms of memory usage. It also gives you a basis for comparison if something goes wrong in future (keep all logs, whether pass or fail, and be sure to include build number in the log file)).

Mawg