Encoding.Default is not the same as no encoding in File.ReadAllText?

views:

517

answers:

Encoding.Default is not the same as no encoding in File.ReadAllText?

(Sorry if this is a dupe)

I've just spent a long time trying to read a text file correctly.

Having started with File.ReadAllText(path) and getting screwed-up characters, I tried several variants of File.ReadAlltext(path, Encoding) after which I got bogged down trying to analyse my input files to work out which byte was the problem, etc.

In desperation I tried File.ReadAllText(path, Encoding.Default), which worked!

I'm now struggling to understand why the default value is apparently only the default value if you specify it.

(My cut-down test string was +4433ç, I saved it in notepad as ANSI - though with Swiss French regional settings...)

+2 A:

Encoding.Default is the system's ANSI codepage.

What File.ReadAllText does if you don't specify an encoding is this:

First it checks whether there's a byte order mark (UTF-8, UTF-16 or UTF-32). If there is, it uses the encoding specified in the byte order mark.
Otherwise, it uses UTF-8.

So the only way to get the system's ANSI codepage is to explicitly specify Encoding.Default.

Daniel 2009-08-20 11:13:40

File.ReadAllText doesn't check for byte order mark. It will always use UTF-8, if you don't specify encoding. This is confirmed by both Reflector and the .NET reference source.

Jivko Petiov 2010-01-16 21:32:28

Jivko, I don't think your comment is correct. ReadAllText without an encoding calls ReadAllText(path, Encoding.UTF8), but the internal stream used by ReadAllText will read the BOM if present and replace the Encoding.UTF8 with the detected encoding. This is because detectEncodingFromByteOrderMarks is set to true in the StreamReader constructor.

Simon D 2010-06-21 13:37:35

ansaurus

tags:

views:

answers:

Encoding.Default is not the same as no encoding in File.ReadAllText?

related questions