I think this is what they were describing:
The PDP-10 referenced in the second paragraph had 36-bit words and was unable to address anything inside of those words. The following text is a description of one way that this problem could have been solved while fitting within the restrictions of the C++ language spec (that are included in the first paragraph).
Let's assume that you want to make 9-bit-long bytes (for some reason). By the spec, a char*
must be able to address individual bytes. The PDP-10 can't do this, because it can't address anything smaller than a 36-bit word.
One way around the PDP-10's limitations would be to simulate a char*
using two words of memory. The first word would be a pointer to the 36-bit word containing the char (this is normally as precise as the PDP-10's pointers allow). The second word would indicate an offset (in bits) within that word. Now, the char*
can access any byte in the system and complies with the C++ spec's limitations.
ASCII-art visual aid:
| Byte 1 | Byte 2 | Byte 3 | Byte 4 | Byte 5 | Byte 6 | Byte 7 | Byte 8 |
-------------------------------------------------------------------------
| Word 1 | Word 2 |
| (Address) | (Offset) |
-------------------------------------------------------------------------
Say you had a char*
with word1 = 0x0100
and word2 = 0x12
. This would point to the 18th bit (the start of the third byte) of the 256th word of memory.
If this technique was really used to generate a conforming C++ implementation on the PDP-10, then the C++ compiler would have to do some extra work with juggling the extra bits required by this rather funky internal format.
The whole point of that article is to illustrate that a char
isn't always 8 bits. It is at least 8 bits, but there is no defined maximum. The internal representation of data types is dependent on the platform architecture and may be different than what you expect.