views:

33

answers:

0

Hello! I'm in the process of researching code pages and have come across many conflicting uses of terminology, even amongst different Wikipedia entries. I just can't find a source of information that spells out the entire character handling process from start to finish. Could someone well versed in this field suggest ways in which the following information is inaccurate or incorrect:

The process of character representation as far as I understand:

  • We start with sets of symbols (not sure of the correct terminology here, possibly 'scripts') that are not associated with any specific platform. 'The Cyrillic alphabet' is understood to refer to the same entity in the context of Windows as in Linux, for example.

  • Members of these sets are selected, generally in bunches, by vendors to form a platform specific character set. The platform might assign these various codes such as GDI values on Windows (eg. 0 for ANSI_CHARSET and the other codes mentioned here: http://asa.diac24.net/wiki/index.php?title=ASS:fe&printable=yes). I cannot find much information on these sets such as whether they are in fact coded character sets or if they are simply unordered and abstract.

  • From these sets, individual code pages are developed that appear to have a one to one mapping with GDI values. Since these GDI values appear to represent sets that are platform dependent, does this mean Windows code pages are essentially a coded version of each individual set?

I've been having trouble reconciling this idea with a link shown to me earlier (which I've lost) that showed a one to many mapping between these GDI charsets and code pages across different platforms. Is this accurate, do these GDI values point to sets from which different code pages across different platforms can be developed?

  • Each code page maps a member of an abstract character set onto an integer to represent its position in the set. In the case of the 'simpler' code pages mentioned on the above webpage, these can be referred to using the more precise 'character map' term. Is this term worth considering or is the distinction too subtle and unimportant?

  • A font resolves a code point to a glyph if it contains one for that code point, otherwise it reports a failure. I've also read that a font may return its own blank glyph for those code points which it doesn't support. Can an application distinguish between this blank glyph and a successful resolution, ie. does the font return an error code of sorts with this blank glyph?

I believe that's the extent of my confusion. Any clarification in this regard would be invaluable. Thanks in advance.