Whats the best way in C# to determine the line endings used in a text file (Unix, Windows, Mac)?
There is Environment.NewLine
though that is only for determining what is used on the current system and won't help with reading files from various sources.
If it's reading I usually look for \n
(Edit: apperantly there are some using only \r
) and assume that the line ends there.
I would imagine you couldn't know for sure, would have to set this in the editor. You could use some AI, the algorithm would be:
- Search for each type of line ending, you'd search those specific characters
- Measure the distances between the them.
- If one type tends to repeat then you assume that's the type. Count the repeats and use some measure of dispersion.
So, for example, if you had repeats of CRLF at 38, 40, 45, and that was within tolerance you'd default to assuming the line end was CRLF.
If it were me, I'd just read the file one char at a time until I came across the first \r
or a \n
. This is assuming you have sensical input.
I'd just search the file for the first \r
or \n
and if it was a \n
I'd look at the previous character to see if it's a \r
, if so, it's \r\n
otherwise it's whichever found.
Reading most of textual formats I usually look for \n, and then Trim() the whole string (whitespaces at beginning and end are often redundant).
Here is some advanced guesswork: read the file, count CRs and LFs
if (CR > LF*2) then "Mac"
else if (LF > CR*2) then "Unix"
else "Windows"
Also note, that newer Macs (Mac OS X) use Unix line endings
Notice that text files may have inconsistent line endings. Your program should not choke on that. Using ReadLine
on a StreamReader
(and similar methods) will take care of any possible line ending automatically.
If you manually read lines from a file, make sure to accept any line endings, even if inconsistent. In practice, this is quite easy using the following algorithm:
- Scan ahead until you find either CR or LF.
- If you read CR, peek ahead at the next character;
- If the next character is LF, consume it (otherwise, put it back).