views:

276

answers:

3

Suppose you have a file that you are programmatically logging information into with regards to a process. Kinda like your typical debug Console.WriteLine, but due to the nature of the code you're testing, you don't have a console to write onto so you have to write it somewhere like a file. My current program uses System.IO.StreamWriter for this task.

My question is about the approach to using the StreamWriter. Is it better to open just one StreamWriter instance, do all of the writes, and close it when the entire process is done? Or is it a better idea to open a new StreamWriter instance to write a line into the file, then immediately close it, and do this for every time something needs to be written in? In the latter approach, this would probably be facilitated by a method that would do just that for a given message, rather than bloating the main process code with excessive amounts of lines. But having a method to aid in that implementation doesn't necessarily make it the better choice. Are there significant advantages to picking one approach or the other? Or are they functionally equivalent, leaving the choice on the shoulders of the programmer?

+4  A: 

Look at pre-rolled logging implementations; they may save you lots of headaches. Obviously keeping the stream open means you might lose some of the end data if it crashes, but may gaing performance from buffering the IO more. Some implementations may also offer features such as asynchronous logging from a spooler/queue.

Marc Gravell
@Marc: Buffering should be a separate issue. An open stream can be an AutoFlush (or potentially unbuffered) stream. I'd guess that System.Console.Error uses AutoFlush, as that is the tradition for the "stderr" handle.
mrjoltcola
@Marc Not to be lazy, but do you have any off-top-o-head recommendations for particularly good logging solutions? I have looked at the suggestion from @mrjoltcola but I figure it's always good to spread out research. And to that end, even though I'll be doing some more research on my own, if you had any immediate suggestions to look into, it would be convenient.
ccomet
A: 

Repeated opening/closing a new StreamWriter for each write will generate a lot of resources for the GC plus impose overhead on the app due to the actual lookup of the file for every open operation. On the other hand, holding a single stream open will hold a lock on the file. So it depends.

You don't want your logging mechanism to become a performance bottleneck, so write to a single stream. Make it unbuffered or AutoFlush for critical debugging (but be advised, that also has a performance impact).

I'd follow the model of log4net, create a static log stream, and write to that singleton. Look into log4net anyway, so you don't roll your own. http://logging.apache.org/log4net/index.html

mrjoltcola
OregonGhost's comment actually resolves my situation. But it isn't an answer... so I can't accept it. This answer, for the more general nature of the question, explicitly explains the differences in the approaches, so I shall accept this one.
ccomet
+1  A: 

I would buffer/queue the data & write out to the file once it reaches a threshold & flush the whole queue when the application closes/exits normally.

The only issue in this is in case of an application crash, you might loose the items in the queue...

HTH.

Sunny
I think losing data on application crash might be critical if you are using the log to debug such a crash, rather than logging for the user. In the latter case, the buffering solution is surely better, especially since your solution won't lock the file all the time, so it's a +1.
OregonGhost
@OregonGhost, if I am debugging crashes, I will set my threshold to write out the file for every 1-2 entries in the queue. In normal application running, I might set it to write for every 100 entries. I dont think performance of the application is a key factor at the time of debugging, but it is important at the time of a live run.
Sunny
@Sunny: If I look at large desktop applications in my company, logging, even immediately, was never relevant to performance (and yes, I profiled (; ). The operating system will cache write data as well, and unless the system crashes, I'll at least have all error messages including stack traces from the very first crash on. We had crashes that were hard to reproduce (depending on external factors, mostly), and this was really helpful in a lot of cases. I understand your point, though, especially if you log a lot of things. I guess it's the difference between a pure error log and a general log.
OregonGhost
@OregonGhost @Sunny For this particular situation, it is general logging which will only be done during the development and testing stages. For actual deployment, we would instead be using a pure error log. Don't know how much this info will impact your answer(s).
ccomet
@ccornet: In that case, I think you should write immediately: During development/debugging, as Sunny already said, performance does not matter (and it will help debug crashes etc.), and if you only intend to log errors in the live build, it's better not to miss a single error, since the performance impact will be negligible.
OregonGhost
@OregonGhost: In reply to "If I look at large desktop applications in my company ... logging, even immediately, was never relevant to performance (and yes, I profiled)..." - You are talking desktop apps. Public facing Web apps can generate 5-10 orders of magnitude more logging depending on volume of page hits, so it may be more of a factor. Not sure if the asker was specific about the type of app. Cheers.
mrjoltcola
@mrjoltcola: Though I doubt 10 orders of magnitude can be reached (if I log 100KB per day, you log 10TB? Even if you do, who wants to read all that? (; ), I understand your point. The question is if an application should try to be more clever than the operating system, which can do a lot of caching in itself, given enough resources. Anyway, I don't have experience with large-scale web applications, and I'm happy with small-scale web on embedded devices, or desktop applications :)
OregonGhost
@OregonGhost: Exactly. You just made my point. A single user, distributed app has a disk and CPU per user. A high volume website has many users generating log traffic in parallel, so performance of logging is relevant there, where it may not be in a single user app.
mrjoltcola