tags:

views:

126

answers:

2

Hello.

Yes is a most frequent question, and this matter is vague for me and since i don't now much about it.

But i would like a very precise way to find a files Encoding. So precise as Notepad++ is.

Thanks.

A: 

I'd try the following steps:

1) Check if there is a Byte Order Mark

2) Check if the file is valid UTF8

3) Use the local "ANSI" codepage (ANSI as Microsoft defines it)

Step 2 works because most non ASCII sequences in codepages other that UTF8 are not valid UTF8.

CodeInChaos
+4  A: 

Since you reference notepad++ I'll assume you mean text files.

There is built-in support in the framework:

// detectEncodingFromByteOrderMarks=true
using (var r = new StreamReader(fileName, true)) 
{
   var e = r.CurrentEncoding;
}
Henk Holterman
@Henk Holterman: I'm not sure but i think I've read somewhere that StreamReader changes the encoding of the file as it is imported into the StreamRead. Is this correct?
Fábio Antunes
@: I meant to find the encoding of any file with the same precision as notepad++ does. Text files, images, zip, anything.
Fábio Antunes
@: I've used notepad++ to generate files with ANSI, UTF-8 and UCS-2 encoding. But the StreamReader always said the files had UTF-8 encoding.
Fábio Antunes
@Fábio, a StreamReader cannot change the file (it only reads) but it produces strings, and in-memory strings are always Unicode (UTF16). Zip and Image files are not encoded text but entirely different data. Detection starts with the file extension, detecting markers has been asked and answered here before.
Henk Holterman
@Henk Holterman: Sorry my ignorance about this topic. But if the StreamReader after reading the file it produces strings that are stored in memory, and memory stored strings are always Unicode (UTF16). Whats the reason to use a StreamReader to find the Encoding format of the file? The bottom line is that the StreamReader after loading my test text files all with different encoding formats, when checking the CurrentEncoding property always stated UTF-8 for all of them.
Fábio Antunes
@Fabio: If you read a UTF8 file but enforce ANSI encoding you will get the wrong (Unicode) strings. I may have been wrong about the CurrentEncoding property though.
Henk Holterman
@Henk Holterman: I don't how i can enforce it. I used the StreamReader the same way you showed, and Notepad++ has options to convert from one encoding type to another.
Fábio Antunes