views:

3475

answers:

3

whatis the difference between Unicode, UTF8, UTF7,UTF16,UTF32,ASCII, ANSI code format of encoding in ASP.net

In what these are helpful for programmers.

+3  A: 

Some reading to get you started on character encodings: Joel on Software: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

By the way - ASP.NET has nothing to do with it. Encodings are universal.

Tomalak
+13  A: 

Going down your list:

  • "Unicode encoding" is more properly known as UTF-16: 2 bytes per "code point". This is the native format of strings in .NET. Values outside the Basic Multilingual Plane (BMP) are encoded as surrogate pairs. (These are relatively rarely used - which is a good job, as very few developers get them right, I suspect. I very much doubt that I do.) "Unicode" is really the character set - it's unfortunate that the term is also used as a synonym for UTF-16 in .NET and various Windows applications.
  • UTF-8: Variable length encoding, 1-4 bytes covers every current character. ASCII values are encoded as ASCII.
  • UTF-7: Usually used for mail encoding. Chances are if you think you need it and you're not doing mail, you're wrong. (That's just my experience of people posting in newsgroups etc - outside mail, it's really not widely used at all.)
  • UTF-32: Fixed width encoding using 4 bytes per code point. This isn't very efficient, but makes life easier outside the BMP. I have a .NET Utf32String class as part of my MiscUtil library, should you ever want it. (It's not been very thoroughly tested, mind you.)
  • ASCII: Single byte encoding only using the bottom 7 bits. (Unicode 0-127.) No accents etc.
  • ANSI: There's no one fixed ANSI encoding - there are lots of them. Usually when people say "ANSI" they mean "the default code page for my system" which is obtained via Encoding.Default, and is often Windows-1252.

There's more on my Unicode page and tips for debugging Unicode problems.

The other big resource of code is unicode.org which contains more information than you'll ever be able to work your way through - possibly the most useful bit is the code charts.

Jon Skeet
Very informative Thanks
BALAMURUGAN
Unicode != UTF-16. Unicode is just the character set, representable as UTF7/8/16/32
jalf
@jalf: But in the context of .NET or Windows in general, when someone talks about the Unicode encoding, they mean UTF-16. Hence the "more properly known as" bit.
Jon Skeet
@jalf: Edited answer to clarify that though.
Jon Skeet
A: 

The best site to refer would be : http://msdn.microsoft.com/en-us/library/dd374081(VS.85).aspx