views:

4163

answers:

11

I want to have a diagnostic log that is produced by several tasks managing data. These tasks may be in multiple threads. Each task needs to write an element (possibly with subelements) to the log; get in and get out quickly. If this were a single-task situation I'd use XMLStreamWriter as it seems like the best match for simplicity/functionality without having to hold a ballooning XML document in memory.

But it's not a single-task situation, and I'm not sure how to best make sure this is "threadsafe", where "threadsafe" in this application means that each log element should be written to the log correctly and serially (one after the other and not interleaved in any way).

Any suggestions? I have a vague intuition that the way to go is to use a queue of log elements (with each one able to be produced quickly: my application is busy doing real work that's performance-sensitive), and have a separate thread which handles the log elements and sends them to a file so the logging doesn't interrupt the producers.

The logging doesn't necessarily have to be XML, but I do want it to be structured and machine-readable.

edit: I put "threadsafe" in quotes. Log4j seems to be the obvious choice (new to me but old to the community), why reinvent the wheel...

+14  A: 

Use a logging framework, such as Log4j.

Avi
thanks! reading the docs (http://logging.apache.org/log4j/1.2/manual.html) now...
Jason S
ensure to use the '%t' expression in the log4j ConversionPattern to log the thread name
chburd
jbu
Log4J is definitely threadsafe in the sense that each log message will be written atomically even when written from different threads. I'm not sure why you would need any other sense of threadsafeness, but you could externally synchronize, as you say.
Avi
+2  A: 

You could use synchronization mechanisms (like a monitor or a semaphor) to make sure, that one log request is processed before accepting the next. This could all be hidden from the code calling the logging routines.

Smasher
I would love to see downvoters comment on their reasons. Of course using a logging framework is perhaps the best choice, but if you want to do it on your own, you have to use synchronization...
Smasher
I didn't downvote you, but I can guess at the reasoning of any that have. Someone who is "not sure how to best make sure this is threadsafe" shouldn't be told to "use a monitor or a semaphor". He should simply use a logging framework.
Adam Jaskiewicz
Besides, the best way to do this in Java if you *were* doing it yourself would probably be to have the various threads doing an offer or put on a BlockingQueue, and a logger thread that does a take in a loop and writes them out to the log file as fast as it can. No need to mess with 'synchronized'.
Adam Jaskiewicz
I didn't downvote either; give Smasher a break: he gave an honest attempt at being helpful.
Jason S
+4  A: 

Use a logging framework, such as Log4.

and if you are not happy with the output you can write your own Appender, Filter, whatever to tweak it just write. So you could do even some caching to rearrange the entries, although I am not saying this is a good idea.

Jens Schauder
+3  A: 

log4j is and has been the standard for java logging for many years. But if you don't fancy an external dependency then the java.util.logging package provides a perfectly good solution.

Gareth Davis
Sure JUL is an alternative, but I'm not sure I'd call it perfectly good. It's best avoided in my opinion, and yes I learnt that the hard way, from being forced to use it.
David Roussel
+3  A: 

Use a logging framework that implements some form of the NDC pattern, like Log4J.

eljenso
+1 for NDC - and don't forget the MDC also, which personally I find more useful than NDC. In a webapp, I like to add the SessionID to the MDC so each log line can have the user's session ID.
matt b
A: 

Developing this yourself in a thread-safe way is not trivial, so you should really use an existing logging framework that is thread-safe. The most commonly used one is Log4J, which is thread-safe (see the FAQ).

Christian Berg
+5  A: 

I think you are on the wrong path. You say “threadsafe” but you actually mean “serialized”. Threadsafe means that one thread will not interfere with data from other thread. Most of the time, threading issues are resolved beforehand and you should not worry about it just for logging sake. For example, if your write:

myVariableSum = 0 + myVariable;
//here comes other thread - Not very likely!
logger.info("Log some INFO; myVariable has value" + myVariable.toString());


You have to make sure that myVariable has not been changed by some other thread from the moment calculation (first line) was performed but before logging method was called. If this happens, you will log dirty value that was not used to perform the operation but value that was assigned by some other thread. This is generally taken care of; for example local (method level) variable can not be changed by other thread. Anyway, if you have to worry about this when logging, than 99% that your program has serious threading issues already.
All major logging frameworks are by themselves “threadsafe” meaning they can be deployed in multithreaded environments and will not display problems similar to one described above internally.
Getting traces to appear in log in order they happen is actually usually called “serialization” of calls. Serializing log writes will be a major performance bottleneck on any multithreaded app. If you use logging framework, like log4j, traces from all threads will appear in single place more or less in order they happen. However, one column is generally Thread name, so you can easily filter your log data by thread; each thread will log its data in chronological order. Check out this link: http://logging.apache.org/log4j/1.2/faq.html#1.7
Finally, if serializing log writes is what you really need, then you could use some kind of structure, like java.util.concurrent.BlockingQueue to route your messages.

Dan
As you say, I meant serialized (was clear on the concept but not the vocabulary). Any threading issues are going to be independent of logging -- if I want to make sure I'm logging the right values I'll make the appropriate copies of volatile data at the right time.
Jason S
Nice job on your explanation of thread safety vs. serialization.
Julie
+2  A: 

I tend to use SLF4J on top of Log4J. The parameterized logging functionality is especially attractive if you are going to have a lot of logging statements that may well get switched off in a production environment.

It can also run over the top of java.util.logging or use it's own simple output.

Evan
+1  A: 

Use logback-classic. It is a newer and better implementation of log4j.

Ceki
A: 

If you had to, you could roll your own .. using single-writer/single-reader FIFO or queues.

JustJeff
A: 

You should try multidimensional analytics. http://www.amazon.com/Java-logging-Reengineered-Edgar-changed/dp/145156399X/ref=pd_rhf_p_t_1

Dimitri
Wow, did you sign up and post here just to promote your own book?
David Roussel