views:

498

answers:

4

Hi!

I've got a string that I want to read line-by-line, but I also need to have the line delimiter character, which StringReader.ReadLine unfortunately trims (unlike in ruby where it is kept). What is the fastest and most robust way to accomplish this?

Alternatives I've been thinking about:

  • Reading the input character-by-character and checking for the line delimiter each time
  • Using RegExp.Split with a positive lookahead

Alternatively I only care about the line delimiter because I need to know the actual position in the string, and the delimiter can be either one or tho character long. Therefore if I could get back the actual position of the cursor within the string would be also good, but StringReader doesn't have this feature.

EDIT: here is my current implementation. End-of-file is designated by returning an empty string.

StringBuilder line = new StringBuilder();
int r = _input.Read();
while (r >= 0)
{
  char c = Convert.ToChar(r);
  line.Append(c);
  if (c == '\n') break;
  if (c == '\r')
  {
    int peek = _input.Peek();
    if (peek == -1) break;
    if (Convert.ToChar(peek) != '\n') break;
  }
  r = _input.Read();
}
return line.ToString();
+2  A: 

Are you concerned about inconsistencies between files (i.e. coming from Unix/Mac vs. Windows), or within files?

One very easy optimization if you know that individual files are consistent with themselves would be to only read the first line character-by-character and figure out what the delimiter is. Then determining the exact position of any other line would be simple math.

Failing that, I think I would go the character-by-character route. A regex seems too "clever." This sounds like a complex function and I think the most important thing would be to make it easy to write, read, understand, and most importantly debug.


There's another way to do this, which would be more efficient if your data source was a stream. Unfortunately it's not, as referenced in your comment, so you would have to create one first; however, I'll include the solution anyway, it might give you some inspiration:

public IEnumerable<int> GetLineStartIndices(string s)
{
    yield return 0;
    byte[] chars = Encoding.UTF8.GetBytes(s);
    using (MemoryStream stream = new MemoryStream(chars))
    {
        using (StreamReader reader = new StreamReader(stream, Encoding.UTF8))
        {
            while (reader.ReadLine() != null)
            {
                yield return stream.Position;
            }
        }
    }
}

This will give you back the start position of each new line. Obviously you can tweak this to do whatever else you need, i.e. do something else with the actual lines you read.

Just note that this has to make a copy of the string to create the byte array, so it's really not suitable for very large strings. It's a bit nicer than the char-by-char approach though, less bug-prone, so perhaps worth considering if the strings are not megabytes-long.

Aaronaught
This is part of a library that is designed to be compatible with mono and .net2. It must be fail-safe, so no assumptions can be made.
SztupY
A: 

If you only care about the position: ReadLine() moves you to the next line. If you store the .Position of the stream underneath you can compare it to the .Position after the following ReadLine(). That's the length of the string you just read plus the delimiter. Length of the delimiter is currentPosition - previousPosition - line.Length.

That way you could easily find out if it was 1 or 2 bytes (without knowing the details, but you said you care only about the positions anyway).

Benjamin Podszun
How can you get the stream out of a StringReader in .NET? I don't see an appropriat function for that in the documentation.
SztupY
Urgs. It doesn't. Pardon, missed the "String" part of the reader and assumed that you'd pass a stream to a StreamReader. If you can do that, my suggestion might work and do what you want. If you cannot do that, then this is useless crap and I could just delete it.
Benjamin Podszun
See Aaronaught for a way to get the positions and look at my suggestion to understand how that might help you. Should (tm) do the trick.
Benjamin Podszun
A: 

File.ReadAllText will get you all of the file contents. Yup. All. So you better check that file size before using it.

EDIT:

read it all in then create an enumerator that yields line by line.

foreach(string line in Read("some.file"))
{ ... }


private IEnumerator Read(string file)
{
  string buffer = File.ReadAllText()
  for (int index=0;index<buffer.length;index++)
   {
      string line = ... logic to build a "line" here
      yield return line;
   }

   yield break;

}
No Refunds No Returns
He says the input is already a string, so presumably it fits in memory.
Aaronaught
And I need to process it line by line, so reading it all is a no-go.
SztupY
A: 
        FileStream fs = new FileStream("E:\\hh.txt", FileMode.Open, FileAccess.Read);
        BinaryReader read = new BinaryReader(fs);
        byte[] ch = read.ReadBytes((int)fs.Length);
        byte[] che=new byte[(int)fs.Length];
        int size = (int)fs.Length,j=0;
        for ( int i =0; i <= (size-1); i++)
        {
            if (ch[i] != '|')
            {
                che[j] = ch[i];
                j++;
            }

        }
        richTextBox1.Text = Encoding.ASCII.GetString(che);
        read.Close();
        fs.Close();
habtamu