views:

1797

answers:

3

When reading data from the Input file I noticed that the ¥ symbom was not being read by the StreamReader. Mozilla Firefox showed the input file type as Western (ISO-8859-1).

After playing around with the encoding parameters I found it worked successfully for the following values:

System.Text.Encoding.GetEncoding(1252) // (western iso 88591)

System.Text.Encoding.Default 

System.Text.Encoding.UTF7

Now I am planning on using the "Default" setting, however I am not very sure if this is the right decision. The existing code did not use any encoding and I am worried I might break something.

I know very little (OR rather nothing) about encoding. How do I go about this? Is my decision to use System.Text.Encoding.Default safe? Should I be asking the user to save the files in a particular format ?

+2  A: 

Code page 1252 isn't quite the same as ISO-Latin-1. If you want ISO-Latin-1, use Encoding.GetEncoding(28591). However, I'd expect them to be the same for this code point (U+00A5). UTF-7 is completely different (and almost never what you want to use).

Encoding.Default is not safe - it's a really bad idea in most situations. It's specific to the particular computer you're running on. If you transfer a file from one computer to another, who knows what encoding the original computer was using?

If you know that your file is in ISO-8859-1, then explicitly use that. What's producing these files? If they're just being saved by the user, what program are they being saved in? If UTF-8 is an option, that's a good one - partly because it can cope with the whole of Unicode.

I have an article on Unicode and another on debugging Unicode issues which you may find useful.

Jon Skeet
UTF-8 doesn't work for me unfortunately. Reading the articles.. "This is a big topic." wasn't exactly the start I was hoping for ;-)
Preets
UTF-8 won't work when you're trying to read a file encoded in ISO-8859-1, no. But if you can persuade your users to save in UTF-8 instead, that would be a win.
Jon Skeet
+1  A: 

The existing code did not use any encoding

It may not have explicitly specified the encoding, in which case the encoding probably defaulted to Encoding.UTF8.

The name Encoding.Default might give the impression that this is the default encoding used by classes such as StreamReader, but this is not the case: As Jon Skeet pointed out, Encoding.Default is the encoding for the operating system's current ANSI code page.

Personally I think this makes the property name Encoding.Default somewhat misleading.

Joe
+1  A: 

Are you a software developer? do not forget to read Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

gimel
nice ! peel onions for 6 months in a submarine !
Preets