views:

5918

answers:

3

Hi,

Is there any way to determine a string's encoding in C#?

Say, I have a filename string, but I don't know if it is encoded in Unicode UTF-16 or the system-default encoding, how do I find out?

Thanks,

kreb

+6  A: 

It depends where the string 'came from'. A .NET string is Unicode (UTF-16). The only way it could be different if you, say, read the data from a database into a byte array.

This CodeProject article might be of interest: Detect Encoding for in- and outgoing text

Jon Skeets's Strings in .NET and C# is an excellent explanation of .NET strings.

Mitch Wheat
It came from a non-Unicode C++ app.. The CodeProject article seems a bit too complex, however it seems to do what I want to do.. Thanks..
krebstar
+3  A: 

Here the C# port of Mozilla Universal Charset Detector

arbiter
Thanks arbiter, looks easier to use than the CodeProject one. Question though, do I just include the resulting DLL in my project, by way of DLLImport? I am quite new to C#, have only been doing C++..
krebstar
No you don't need to use DLLImport. Just build this library and add resulting assembly to your project as external reference. Or, alternatively, you can simple add all *.cs files from library into your project.
arbiter
Thanks :) Will check this out :)
krebstar
+2  A: 

Check out Utf8Checker it is simple class that does exactly this in pure managed code. http://utf8checker.codeplex.com

Notice: as already pointed out "determine encoding" makes sense only for byte streams. If you have a string it is already encoded from someone along the way who already knew or guessed the encoding to get the string in the first place.

devdimi