views:

517

answers:

1

(Sorry if this is a dupe)

I've just spent a long time trying to read a text file correctly.

Having started with File.ReadAllText(path) and getting screwed-up characters, I tried several variants of File.ReadAlltext(path, Encoding) after which I got bogged down trying to analyse my input files to work out which byte was the problem, etc.

In desperation I tried File.ReadAllText(path, Encoding.Default), which worked!

I'm now struggling to understand why the default value is apparently only the default value if you specify it.

(My cut-down test string was +4433ç, I saved it in notepad as ANSI - though with Swiss French regional settings...)

+2  A: 

Encoding.Default is the system's ANSI codepage.

What File.ReadAllText does if you don't specify an encoding is this:

  • First it checks whether there's a byte order mark (UTF-8, UTF-16 or UTF-32). If there is, it uses the encoding specified in the byte order mark.
  • Otherwise, it uses UTF-8.

So the only way to get the system's ANSI codepage is to explicitly specify Encoding.Default.

Daniel
File.ReadAllText doesn't check for byte order mark. It will always use UTF-8, if you don't specify encoding. This is confirmed by both Reflector and the .NET reference source.
Jivko Petiov
Jivko, I don't think your comment is correct. ReadAllText without an encoding calls ReadAllText(path, Encoding.UTF8), but the internal stream used by ReadAllText will read the BOM if present and replace the Encoding.UTF8 with the detected encoding. This is because detectEncodingFromByteOrderMarks is set to true in the StreamReader constructor.
Simon D