ansaurus

Question

Reading text files line by line, with exact offset/position reporting

Answer 1

A:

Would this work:

using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) {
  string line;
  long pos = 0;
  while ((line = sr.ReadLine()) != null) {
    Console.Write("{0:d3} ", pos);
    Console.WriteLine(line);
    pos += line.Length;
  }
}

Sani Huttunen 2010-04-07 16:35:39

Unfortunately not, because I have to accept different types of newlines (think this \n, \r\n, \r) and the number would be skewed. This might work if I insist to have a _consistent_ newline separator (it could very well be mixed in practice) and if I probe it first, to know the real offset. So - I'm trying to avoid going down that route.

Benjamin Podszun 2010-04-07 16:38:40

@Benjamin: Darn - I just posted a similar answer which explicitly relied on a consistent newline separator...

Jon Skeet 2010-04-07 16:40:32

Then I think you'd be better off doing it manually with StreamReader.Read().

Sani Huttunen 2010-04-07 16:42:41

@Jon: Hehe. As I said: That _might_ be the way, instead of using a plain Stream - if these are the only two options I've to roll a dice and live with the consequences: Either the consistent separators (bad for files that were processed on more than one platform, copy/pasted in bad editors etc) or the Stream stuff (boring low level line parsing and string encoding mess, a lot of boiler plate code for a seemingly low return)

Benjamin Podszun 2010-04-07 16:44:15

@Sani: That wouldn't help much. I have to ditch the whole `StreamReader`. Even `Read()` on it leads to a block read on the underlying stream and moves the `BaseStream.Position` to 25 for my sample. After _one char_.

Benjamin Podszun 2010-04-07 16:47:46

Answer 2

+3 A:

You could create a TextReader wrapper, which would track the current position in the base TextReader :

public class TrackingTextReader : TextReader
{
    private TextReader _baseReader;
    private int _position;

    public TrackingTextReader(TextReader baseReader)
    {
        _baseReader = baseReader;
    }

    public override int Read()
    {
        _position++;
        return _baseReader.Read();
    }

    public override int Peek()
    {
        return _baseReader.Peek();
    }

    public int Position
    {
        get { return _position; }
    }
}

You could then use it as follows :

string text = @"Foo
Bar
Baz
Bla
Fasel";

using (var reader = new StringReader(text))
using (var trackingReader = new TrackingTextReader(reader))
{
    string line;
    while ((line = trackingReader.ReadLine()) != null)
    {
        Console.WriteLine("{0:d3} {1}", trackingReader.Position, line);
    }
}

Thomas Levesque 2010-04-07 16:58:07

Seems to work. That somehow seems so obvious now.. Thanks a lot.

Benjamin Podszun 2010-04-07 17:06:30

This solution is fine as long as you want the character position, rather than the byte position.If the underlying file has a Byte Order Mark (BOM) it will offset, or if it uses multi-byte characters, the 1:1 correspondence between characters and bytes no longer holds.

FrederikB 2010-04-27 11:36:39

Agreed, only works for single byte encoded characters e.g. ASCII. If for instance your underlying file is Unicode, each character will be 2 or 4 byte encoded. The implementation above is working on a character stream, not a byte stream, so you will get character offsets which will not map onto the actual byte positions as each character can be 2 or 4 bytes. For example, the second character position will be reported as index 1, but the byte position will actually be index 2 or 4. If there is a BOM (Byte Order Mark) this will again add extra bytes to the true underlying byte position.

chibacity 2010-04-27 11:52:56

ansaurus

tags:

views:

answers:

Reading text files line by line, with exact offset/position reporting

related questions