views:

127

answers:

1

I have an application that I've been tasked with cleaning up after. The application itself is relatively simple - it runs a SQL query, consumes a web service, and spews the results to a log file. My job is to archive the files to our NAS after the application is done with them. It locks the files exclusively until it's done with them so it adds a small bit of complexity. I'm also not allowed to touch the application, just the logs. Anyway my application is fairly simple:

  1. Check if the file can be opened (catch IOException) and mark it off as accessible in a bool[] if no exception is thrown.
  2. Going through the array of files marked true, read each line of the file into a StreamReader using the ReadLine method. Because the application occasionally hiccups and doesn't finish, I can't simply use the IOException to tell if the file is completed - I have to actually parse the text.
  3. If the text indicating completion is found, zip the file, load the archived file onto the NAS, and delete the original.

My code works, it's just very time consuming (the log files are each around 500 MB). My thoughts on improvement involve starting my search from the bottom of the file instead of from the top, but the StreamReader doesn't support such a method. I can't use the ReadToEnd method and then reverse read because that just throws an out of memory exception. Any thoughts on a way I could speed up the parsing of the log file?

+5  A: 

I assume you look for a single marker at the end of the file to determine if it is finished? If so I also assume the marker is of a known length, for example a single byte or a sequence of 3 bytes etc.

If the above assumptions are correct, you can open the FileStream, Seek to the end of the file minus the expected marker length read the bytes and if the marker is present and complete you know you can process the file.

Seeking to the end -3 bytes can be done with code like the following

// Seek -3 bytes starting from the end of the file
fileStream.Seek(-3, SeekOrigin.End);
Chris Taylor
Seeking can be a costlier operation than sequential read and doing multiple seeks can be quite slow.
josephj1989
It's something I haven't tried yet though so it's worth a shot. I'll try implementing the seek and see if that speeds things up or not. Thanks all.
monkeyninja
@josephj1989, are you saying it is quicker to read a 500 MB file line by line or in memory friendly chunks until the end than it is to simply seek directly to the end? And why multiple seeks, my stated assumption is that the marker is at the end of the file so only a single seek.
Chris Taylor