views:

120

answers:

3

I'm trying to read a (small-ish) file in chunks of a few lines at a time, and I need to return to the beginning of particular chunks.

The problem is, after the very first call to

streamReader.ReadLine();

the streamReader.BaseStream.Position property is set to the end of the file! Now I assume some caching is done in the backstage, but I was expecting this property to reflect the number of bytes that I used from that file. And yes, the file has more than one line :-)

For instance, calling ReadLine() again will (naturally) return the next line in the file, which does not start at the position previously reported by streamReader.BaseStream.Position.

My question is, how can I find the actual position where the 1st line ends, so I can return there later?

I can only think of manually doing the bookkeeping, by adding the lengths of the strings returned by ReadLine(), but even here there are a couple of caveats:

  • ReadLine() strips the new-line character(s) which may have a variable length (is is '\n' ? is it "\r\n" ? etc)
  • I'm not sure if this would work ok with variable-length characters

...so right now it seems like my only option is to rethink how I parse the file, so I don't have to rewind.

If it helps, I open my file like this:

using (var reader = new StreamReader(
        new FileStream(
                       m_path, 
                       FileMode.Open, 
                       FileAccess.Read, 
                       FileShare.ReadWrite)))
{...}

Any suggestions?

+2  A: 

StreamReader isn't designed for this kind of usage, so if this is what you need I suspect that you'll have to write your own wrapper for FileStream.

JSBangs
+3  A: 

If you need to read lines, and you need to go back to previous chunks, why not store the lines you read in a List ? That should be easy enough.

You should not depend on calculating a length in bytes based on the length of the string - for the reasons you mention yourself: Multibyte characters, Newline characters, etc.

driis
+1  A: 

I have done similar implementation where I need to access n-th line in the extremely big text file fast.

The reason streamReader.BaseStream.Position had pointed to the end of file is that it has a built-in buffer, as you expected.

Bookkeeping by counting number of bytes read from each ReadLine() call will work for most plain text files. However, I have encounter cases where there control character, the unprintable one, mixed in the text file. The number of bytes calculated is wrong, caused my program not able to seek to correct location thereafter.

My final solution is to go with implementing the line reader on my own. It worked well so far. This should give some ideas what it looks like:

using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
    int ch;
    int currentLine = 1, offset = 0;

    while ((ch = fs.ReadByte()) >= 0)
    {
        offset++;

        // this covers all cases: \r\n and only \n (for UNIX files)
        if (ch == 10) 
        {
            currentLine++;

            // ... do sth such as log current offset with line number

        }

    } 

}

And to go back to logged offset:

using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
    fs.Seek(yourOffset, SeekOrigin.Begin);
    TextReader tr = new StreamReader(fs);

    string line = tr.ReadLine();

}

Also note there is already buffering mechanism built into FileStream.

m3rLinEz
There are problems. Dealing with the BOM is a biggie.
Hans Passant