ansaurus

Question

Answer 1

A:

A Windows code page is similar to a code set such as ISO 8859-1. It maps certain numbers (how characters are stored on disk) to certain glyphs (characters as they appear on the screen, in an abstract way). It does not correspond to a font directly - though a font may support a given code set or code page. For example, both Courier New and Times Roman fonts may be used to display CP1252 and they look different on the screen, even though the data on disk may be the same.

The first 256 code points of Unicode are the same as the code points of ISO 8859-1. In ISO 8859-1, code point 252 (0xFC) is LATIN SMALL LETTER U WITH DIAERESIS (colloquially, u-with-umlaut, or 'ü').

There are code set conversion functions; the ICU supports some. There are Windows-specific code set converters to, I have no doubt; I just don't know what their names are. It will depend, in part, on which language(s) you are using.

Jonathan Leffler 2010-04-06 01:50:44

Answer 2

A:

A windows code page is a means for translating an 8 bit value to a character. Most Windows computers in the US use Windows-1252.

Newer Windows programs typically use UTF-8 to store text files and internally use wide strings which are UTF-16. This eliminates code page issues, so a text file written in Hungary will look the same when opened in the US.

Stephen Nutt 2010-04-06 01:51:06

Answer 3

+1 A:

Windows code-pages are a relic of pre-unicode days, when languages with different characters would still attempt to represent them using one (or two in the case of Asian) bytes. This is where the concept of a character set comes into play. English, for instance, is "windows-1252". The various code pages can be installed through the Regional & Language Options control panel. A list of code-pages can be found here - http://msdn.microsoft.com/en-us/goglobal/bb964654.aspx

Within .NET, code-pages are accessed through the System.Text.Encoding class. This provides a method for converting from one code page to another. For instance, to convert a string in windows-1252 to utf8 (admittedly usually a fairly pointless exercise), you could use this code:

using System.Text;

public string GetUtf8StringFromDefaultEncoding(string target, string codePage) {
     Encoding windows = Encoding.GetEncoding(codePage);
     byte[] windowsBytes = windows.GetBytes("Hello World");
     string utf8String = new UTF8Encoding().GetString(windowsBytes);
     return utf8String;
}

public static void Main() {
     Console.Out.WriteLine(GetUtf8StringFromDefaultEncoding("Hello World", 
                           "windows-1252"));
}

John Christensen 2010-04-06 01:56:10

are there any windows system routines or library functions callable from c++ to work with code pages?

Mike D 2010-04-07 12:49:59

I'm not entirely sure, but a quick look at the msdn site suggests this link - http://msdn.microsoft.com/en-us/library/dd374085%28VS.85%29.aspx

John Christensen 2010-04-07 18:57:24

Answer 4

+1 A:

Here is a must-read explanation of Unicode and Characters Sets (including code pages) from Joel Spolsky

PabloG 2010-04-06 02:49:16

+1 for the excellent Spolsky link. That really is the minimal information every programmer should know, presented in an amusing manner. And the simplifications don't really amount to lies, as often happens with simplifications.

Adrian McCarthy 2010-04-16 20:25:25

ansaurus

tags:

views:

answers:

What are Windows code pages?

related questions