I noticed a strange behaviour of File.Copy()
in .NET 3.5SP1. I don't know if that's a bug or a feature. But I know it's driving me crazy. We use File.Copy()
in a custom build step, and it screws up the character encoding.
When I copy an ASCII encoding text file over a UTF-8 encoded text file, the destination file still is UTF-8 encoded, but has the content of the new file plus the 3 prefix characters for UTF-8. That's fine for ASCII characters, but incorrect for the remaining characters (128-255) of the ANSI code page.
Here's the code to reproduce. I first copy a UTF-8 file to the destination, then I copy an ANSI file to the same destination. Notice the output of the second console output: Content of copy.txt : this is ASCII encoded: / Encoding: utf-8
File.WriteAllText("ANSI.txt", "this is ANSI encoded: é", Encoding.GetEncoding(0));
File.WriteAllText("UTF8.txt", "this is UTF8 encoded: é", Encoding.UTF8);
File.Copy("UTF8.txt", "copy.txt", true);
using (StreamReader reader = new StreamReader("copy.txt", true))
{
Console.WriteLine("Content of copy.txt : " + reader.ReadToEnd() + " / Encoding: " +
reader.CurrentEncoding.BodyName);
}
File.Copy("ANSI.txt", "copy.txt", true);
using (StreamReader reader = new StreamReader("copy.txt", true))
{
Console.WriteLine("Content of copy.txt : " + reader.ReadToEnd() + " / Encoding: " +
reader.CurrentEncoding.BodyName);
}
Any ideas why this happens? Is there a mistake in my code? Any ideas how to fix this (my current idea is to delete the file before if it exists)
EDIT: correct ANSI/ASCII confusion