views:

4365

answers:

9

What is ANSI encoded format? Is it a system default format? In what way does it differ from ASCII?

A: 

ANSI (aka Windows-1252/WinLatin1) is a character encoding of the Latin alphabet, fairly similar to ISO-8859-1. You may want to take a look of it at Wikipedia.

moff
A: 

This will answer your question I believe.

http://www.joelonsoftware.com/articles/Unicode.html

Kevin
A: 

When using single-byte characters, the ASCII format defines the first 127 characters. The extended characters from 128-255 are defined by various ANSI code pages to allow limited support for other languages. In order to make sense of an ANSI encoded string, you need to know which code page it uses.

Eric Petroelje
+2  A: 

ASCII just defines a 7 bit code page with 128 symbols. ANSI extends this to 8 bit and there are several different code pages for the symbols 128 to 255.

The naming ANSI is not correct because it is actually the ISO/IEC 8859 norm that defines this code pages. See ISO/IEC 8859 for reference. There are 16 code pages ISO/IEC 8859-1 to ISO/IEC 8859-16.

Windows-1252 is again based on ISO/IEC 8859-1 with some modification mainly in the range of the C1 control set in the range 128 to 159. Wikipedia states that Windows-1252 is also refered as ISO-8859-1 with a second hyphen between ISO and 8859. (Unbelievable! Who does something like that?!?)

Daniel Brückner
+5  A: 

ANSI encoding is a slightly generic term used to refer to the standard code page on a system. On Windows is more properly referred to as Windows-1252 (at least on Western/U.S. systems, it can represent certain other Windows code pages on other systems). This is essentially an extension of the ASCII character set in that it includes all the ASCII characters with an additional 127 character codes. This difference is due to the fact that ANSI encoding is 8-bit rather than 7-bit as ASCII is (though perhaps confusingly ASCII is almost always encoded nowadays as bytes with the MSB set to 0). See the article for an explanation of why this encoding is usually referred to as ANSI - it seems to be a slight misnomer as a matter of fact, but the name has stuck and everyone uses it.

Noldorin
ANSI does not necessarily have to map to CP1252. It does, however, always refer to the legacy codepage set for the system. This may be CP1252 on western European or US systems but don't count on that.
Joey
@Johannes: Yeah, that's a good point. I'll edit the post to reflect that.
Noldorin
Downvoters: Care to give any reason why?
Noldorin
+1  A: 

Strictly speaking, there is no such thing as ANSI encoding. The term ANSI is used for several different encodings:

  1. ISO 8859-1
  2. Windows CP1252
  3. Current system encoding on a Windows machine (in Win32 API terminology).
Nemanja Trifunovic
+2  A: 

Basically "ANSI" refers to the legacy codepage on Windows. See also an article by Raymond Chen on this topic. The first 127 characters are identical to ASCII in most code pages, the upper characters vary, though.

However, ANSI does not automatically mean CP1252 or Latin 1.

All confusion notwithstanding you should simply avoid such issues nowadays and use Unicode.

Joey
A: 

I remember when "ANSI" text referred to the pseudo VT-100 escape codes usable in DOS through the ANSI.SYS driver to alter the flow of streaming text.... Probably not what you are referring to but if it is see http://en.wikipedia.org/wiki/ANSI_escape_code

jmucchiello
+4  A: 

Technically, ANSI should be the same as US-ASCII. It refers to the ANSI X3.4 standard, which is simply the ANSI organisation's ratified version of ASCII. Use of the top-bit-set characters is not defined in ASCII/ANSI as it is a 7-bit character set.

However years of misuse of the term by the DOS and subsequently Windows community has left its practical meaning as “the system codepage of whatever machine is being used”. The system codepage is also sometimes known as ‘mbcs’, since on East Asian systems that can be a multiple-byte-per-character encoding. Some code pages can even use top-bit-clear bytes as trailing bytes in a multibyte sequence, so it's not even strict compatible with plain ASCII... but even then, it's still called “ANSI”.

On US and Western European default settings, “ANSI” maps to Windows code page 1252. This is not the same as ISO-8859-1 (although it is quite similar). On other machines it could be anything else at all. This makes “ANSI” utterly useless as an external encoding identifier.

bobince