tags:

views:

85

answers:

3

I have a text file that I want to read line by line and record the position in the text file as I go. After reading any line of the file the program can exit, and I need to resume reading the file at the next line when it resumes.

Here is some sample code:

using (FileStream fileStream = new FileStream("Sample.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
    fileStream.Seek(GetLastPositionInFile(), SeekOrigin.Begin);
    using (StreamReader streamReader = new StreamReader(fileStream))
    {
        while (!streamReader.EndOfStream)
        {
            string line = streamReader.ReadLine();
            DoSomethingInteresting(line);
            SaveLastPositionInFile(fileStream.Position);

            if (CheckSomeCondition())
            {
                break;
            }
        }
    }
}

When I run this code, the value of fileStream.Position does not change after reading each line, it only advances after reading a couple of lines. When it does change, it increases in multiples of 1024. Now I assume that there is some buffering going on under the covers, but how can I record the exact position in the file?

+5  A: 

It's not FileStream that's responsible - it's StreamReader. It's reading 1K at a time for efficiency.

Keeping track of the effective position of the stream as far as the StreamReader is concerned is tricky... particularly as ReadLine will discard the line ending, so you can't accurately reconstruct the original data (it could have ended with "\n" or "\r\n"). It would be nice if StreamReader exposed something to make this easier (I'm pretty sure it could do so without too much difficulty) but I don't think there's anything in the current API to help you :(

By the way, I would suggest that instead of using EndOfStream, you keep reading until ReadLine returns null. It just feels simpler to me:

string line;
while ((line = reader.ReadLine()) != null)
{
    // Process the line
}
Jon Skeet
"For efficiency" is right! My initial implementation of reading 1 bytes at a time directly from the FileStream is horribly inefficient! I think I might have to implement my own buffering solution.
Stefan Moser
+1  A: 

I would agree with Stefan M., it is probably the buffering which is causing the Position to be incorrect. If it is just the number of characters that you have read that you want to track than I suggest you do it yourself, as in:

        using(FileStream fileStream = new FileStream("Sample.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) {
            fileStream.Seek(GetLastPositionInFile(), SeekOrigin.Begin);
            **Int32 position = 0;**
            using(StreamReader streamReader = new StreamReader(fileStream)) {
                while(!streamReader.EndOfStream) {
                    string line = streamReader.ReadLine();
                    **position += line.Length;**
                    DoSomethingInteresting(line);
                    **SaveLastPositionInFile(position);**

                    if(CheckSomeCondition()) {
                        break;
                    }
                }
            }
        }
Steve Ellinger
This is a great suggestion, but I'm not sure it is entirely viable given that line.Length may not be the number of bytes that were read. As Jon mentioned, StreamReader will drop characters like \r and \n.
Stefan Moser
A: 

Provide that your file is not too big, why not read the whole thing in big chuncks and then manipulate the string - probably faster than the stop and go i/o.

For example,

            //load entire file
            StreamReader srFile = new StreamReader(strFileName);
            StringBuilder sbFileContents = new StringBuilder();
            char[] acBuffer = new char[32768];
            while (srFile.ReadBlock(acBuffer, 0, acBuffer.Length)
                > 0)
            {
                sbFileContents.Append(acBuffer);
                acBuffer = new char[32768];
            }

            srFile.Close();
bigtang