tags:

views:

66

answers:

1

Hi folks,

Yes, this might sound like a newbie question but there's a TWIST! (And i've done an SO search already...)

I'm trying to read in multiple files, one at a time ... while each file is possibly getting new data APPENDED to the end of it.

I always know the last character position i was last at.

So when I get to a file, I'm thinking I want to read from start position to current end position at the time that line of code was executed?

Then parse all that data.. and goto the next file. While I was parsing that data, it's possible NEW data was appended to the end of the file .. which I will not get to until i come back around and start parsing the file again.

I have around 300-400 files to parse .. which is why I don't particularly want to make shit loads of threads and each thread has it's own db connection, etc. whatever..etc.

Lastly, the new data is not THAT much, per second. so it's not like i'll never 'catch up'.

So .. any ideas how to best do this?

I was also parsing each 'line' .. where a line is a bit of text ended by a '\n', etc.

Thoughts?

Updates / ReEdits..

  • Don't worry about the fact i've got many files and want to store the data into a database. The reason I suggested that was so people didn't start giving suggestions to use multiple threads per file, file watchers, etc...
  • Each log file has the same structure, etc. just different data/content. And no, I don't want to detect when they have been updated. I just want to manually parse each one, from the last stream position to the current end position.
A: 

Having the last character position doesn't help you much. What you want is the last stream position (i.e. in bytes). That way you can just open a FileStream to the file, seek to the right place, and then create a StreamReader wrapping the stream.

If you use a StreamReader when initially reading, you should be able to get the final position by just checking the position of the base stream when you've reached the end... normally you'd have to worry about the StreamReader having buffered extra data (and thus having read more from the file) but if you've reached the end of the file, it couldn't have read any more anyway :)

Jon Skeet
@Jon Skeet - ah sorry. i ment StreamReader position - correct :) Currently, i've got a big long StringBuilder that's appending everything .. while there's data: ` while ((streamPos = stream.Read(buffer, 0, 4096)) > 0) ... \\ append ..` .. but this kills me for large files i process :( so i was thinking there could be a better way to just parse each line, instead of readingin in ALL the data, then parsing it.... ?
Pure.Krome
@Pure.Krome: You mean like `TextReader.ReadLine()`? It's not clear whether your lines are actually terminated with a linefeed or a backslash followed by an n. Oh, and reading a text file *fully* is a lot simpler than you're making it: `File.ReadAllText`...
Jon Skeet
@Jon will that allow data to be appended, while i read each line .. then parse .. then read again? what happens if the file keeps getting more and more stuff (and the parsing takes a bit of time) .. will it be possible that i never get to the end .. thus neglecting the other files?
Pure.Krome
@Jon Skeet - what I ment was, if i use a `FileShare.ReadWrite` `StreamReader` ... is it possible I'll never get to the end of the file if new data is constantly appened and my `CustomParse` method *takes a while* ? Instead of having a fixed end stream point .. and parsing till that... ?
Pure.Krome
@Pure.Krome: Yes, reading File.ReadAllText won't help you much with incremental reading. As for whether you never get to the end of the stream - if that happens, then obviously data is being created faster than you can parse it, and without increasing your parse speed (e.g. by using multiple threads) you'll never catch up. Whether or not that's a real concern is something I can't answer.
Jon Skeet
@Jon Skeet - so there's no way I could loop through a stream .. but define the `start` and `end` stream locations .. instead of just saying .. loop through stream till *the end* ?
Pure.Krome
@Pure.Krome: Not with the built-in streams. You could do it with your own stream wrapper which enforces some maximum length... but you'd need to bear in mind that you could end up reading part of a line at the end. In fact, that's something you'll need to be aware of whatever you do. Your life would be a lot simpler if you could take exclusive access of the file - if your "writing" program could try to use the normal file, but create a secondary file if the primary one was unavailable.
Jon Skeet
@Jon Skeet - cheers mate : Much appreciated :)
Pure.Krome