views:

730

answers:

11

I would like to know architectures which violate the assumptions I've listed below. Also I would like to know if any of the assumptions are false for all architectures (i.e. if any of them are just completely wrong).

  1. sizeof(int *) == sizeof(char *) == sizeof(void *) == sizeof(func_ptr *)

  2. The in-memory representation of all pointers for a given architecture is the same regardless of the data type pointed to.

  3. The in-memory representation of a pointer is the same as an integer of the same bit length as the architecture.

  4. Multiplication and division of pointer data types are only forbidden by the compiler. NOTE: Yes I know this is nonsensical. What I mean is - is there hardware support to forbid this incorrect usage?

  5. All pointer values can be casted to a single integer. In other words, what architectures still make use of segments and offsets?

  6. Incrementing a pointer is equivalent to adding sizeof(the pointed data type) to the memory address stored by the pointer. If p is an int32* then p+1 is equal to the memory address 4 bytes after p.

I'm most used to pointers being used in a contiguous, virtual memory space. For that usage, I can generally get by thinking of them as addresses on a number line. See (http://stackoverflow.com/questions/1350471/pointer-comparison/1350488#1350488).

+8  A: 

I don't have specific real world examples in mind but the "authority" is the C standard. If something is not required by the standard, you can build a conforming implementation that intentionally fails to comply with any other assumptions. Some of these assumption are true most of the time just because it's convenient to implement a pointer as an integer representing a memory address that can be directly fetched by the processor but this is just a consequent of "convenience" and can't be held as a universal truth.

  1. Not required by the standard (see this question). For instance, sizeof(int*) can be unequal to size(double*). void* is guaranteed to be able to store any pointer value.
  2. Not required by the standard. By definition, size is a part of representation. If the size can be different, the representation can be different too.
  3. Not necessarily. In fact, "the bit length of an architecture" is a vague statement. What is a 64-bit processor, really? Is it the address bus? Size of registers? Data bus? What?
  4. It doesn't make sense to "multiply" or "divide" a pointer. It's forbidden by the compiler but you can of course multiply or divide the underlying representation (which doesn't really make sense to me) and that results in undefined behavior.
  5. Maybe I don't understand your point but everything in a digital computer is just some kind of binary number.
  6. Yes; kind of. It's guaranteed to point to a location that's a sizeof(pointer_type) farther. It's not necessarily equivalent to arithmetic addition of a number (i.e. farther is a logical concept here. The actual representation is architecture specific)
Mehrdad Afshari
Do you have any links supporting #1 and #2? I'm genuinely interesting in see why this would be the case.#3 shouldn't be vague - whatever size you need to address memory for a given architecture.#4 I realize that it doesn't make sense to multiply pointers. I'll make this clear in the question.#5 I'm referring to a single number that isn't composed of more than one 'thing'.
Will Bickford
Will: #1, and consequently #2 is covered by the other SO question I linked into. There's a quote from the standards in the answer. Re #3: what you're saying is the size of address bus, but not all processors have a single address bus/memory. A (hypothetical) processor might integer data in one RAM and floating point data to another. #5: Well, you can append those "two" sections and put in "one thing" which will still be a number. I'm not exactly sure about what you're saying here so I prefer to stay on the safe side and not to comment on #5.
Mehrdad Afshari
The only architectures I know about for #3 would be ones with separate instruction and data memory. I've never seen or heard of one with separate integer and floating point memory.
Will Bickford
Will: That was hypothetical, of course. The whole point was a processor can be directly connected to more than one memory (and the width of those memory buses can vary.)
Mehrdad Afshari
+2  A: 

The in-memory representation of a pointer is the same as an integer of the same bit length as the architecture.

I think this assumption is false because on the 80186, for example, a 32-bit pointer is held in two registers (an offset register an a segment register), and which half-word went in which register matters during access.

Multiplication and division of pointer data types are only forbidden by the compiler.

You can't multiply or divide types. ;P

I'm unsure why you would want to multiply or divide a pointer.

All pointer values can be casted to a single integer. In other words, what architectures still make use of segments and offsets?

The C99 standard allows pointers to be stored in intptr_t, which is an integer type. So, yes.

Incrementing a pointer is equivalent to adding sizeof(the pointed data type) to the memory address stored by the pointer. If p is an int32* then p+1 is equal to the memory address 4 bytes after p.

x + y where x is a T * and y is an integer is equivilent to (T *)((intptr_t)x + y * sizeof(T)) as far as I know. Alignment may be an issue, but padding may be provided in the sizeof. I'm not really sure.

strager
Is 80186 still used anywhere? The multiplication question is mostly theoretical: I have no use case for it.
Will Bickford
stager: Re last point. You cannot perform pointer arithmetic on `void*` pointers.
Mehrdad Afshari
@Mehrdad, Thanks, I've corrected that.
strager
+1  A: 

I don't know about the others, but for DOS, the assumption in #3 is untrue. DOS is 16 bit and uses various tricks to map many more than 16 bits worth of memory.

Imagist
+5  A: 

For 6.: a pointer is not necessarily a memory address. See e.g. "The Great Pointer Conspiracy" by Stack Overflow user jalf:

Yes, I used the word “address” in the com­ment above. It is impor­tant to real­ize what I mean by this. I do not mean “the mem­ory address at which the data is phys­i­cally stored”, but sim­ply an abstract “what­ever we need in order to locate the value. The address of i might be any­thing, but once we have it, we can always find and mod­ify i."

And:

A pointer is not a mem­ory address! I men­tioned this above, but let’s say it again. Point­ers are typ­i­cally imple­mented by the com­piler sim­ply as mem­ory addresses, yes, but they don’t have to be."

Peter Mortensen
+1 good informative link.
Will Bickford
That answer answers a few other questions too it seems. Kinda.
strager
oo, I'm famous :D
jalf
+6  A: 

Some further information about pointers from the C99 standard:

  • 6.2.5 §27 guarantees that void* and char* have identical representations, ie they can be used interchangably without conversion, ie the same address is denoted by the same bit pattern (which doesn't have to be true for other pointer types)
  • 6.3.2.3 §1 states that any pointer to an incomplete or object type can be cast to (and from) void* and back again and still be valid; this doesn't include function pointers!
  • 6.3.2.3 §6 states that void* can be cast to (and from) integers and 7.18.1.4 §1 provides apropriate types intptr_t and uintptr_t; the problem: these types are optional - the standard explicitly mentions that there need not be an integer type large enough to actually hold the value of the pointer!
Christoph
Thanks for the C99 standard references.
Will Bickford
+2  A: 

In general, the answer to all of the questions is "yes", and it's because only those machines that implement popular languages directly saw the light of day and persisted into the current century. Although the language standards reserve the right to vary these "invariants", or assertions, it hasn't ever happened in real products, with the possible exception of items 3 and 4 which require some restatement to be universally true.

It's certainly possible to build segmented MMU designs, which correspond roughly with the capability-based architectures that were popular academically in past years, but no such system has typically seen common use with such features enabled. Such a system might have conflicted with the assertions as it would probably have had large pointers.

In addition to segmented/capability MMUs, which often have large pointers, more extreme designs have tried to encode data types in pointers. Few of these were ever built. (This question brings up all of the alternatives to the basic word-oriented, a pointer-is-a-word architectures.)

Specifically:

  1. The in-memory representation of all pointers for a given architecture is the same regardless of the data type pointed to. True except for extremely wacky past designs that tried to implement protection not in strongly-typed languages but in hardware.
  2. The in-memory representation of a pointer is the same as an integer of the same bit length as the architecture. Maybe, certainly some sort of integral type is the same, see LP64 vs LLP64.
  3. Multiplication and division of pointer data types are only forbidden by the compiler. Right.
  4. All pointer values can be casted to a single integer. In other words, what architectures still make use of segments and offsets? Nothing uses segments and offsets today, but a C int is often not big enough, you may need a long or long long to hold a pointer.
  5. Incrementing a pointer is equivalent to adding sizeof(the pointed data type) to the memory address stored by the pointer. If p is an int32* then p+1 is equal to the memory address 4 bytes after p. Yes.

It is interesting to note that every Intel Architecture CPU, i.e., every single PeeCee, contains an elaborate segmentation unit of epic, legendary, complexity. However, it is effectively disabled. Whenever a PC OS boots up, it sets the segment bases to 0 and the segment lengths to ~0, nulling out the segments and giving a flat memory model.

DigitalRoss
+7  A: 

I can't give you concrete examples of all of these, but I'll do my best.

sizeof(int *) == sizeof(char *) == sizeof(void *) == sizeof(func_ptr *)

I don't know of any systems where I know this to be false, but consider:

Mobile devices often have some amount of read-only memory in which program code and such is stored. Read-only values (const variables) may conceivably be stored in read-only memory. And since the ROM address space may be smaller than the normal RAM address space, the pointer size may be different as well. Likewise, pointers to functions may have a different size, as they may point to this read-only memory into which the program is loaded, and which can otherwise not be modified (so your data can't be stored in it).

So I don't know of any platforms on which I've observed that the above doesn't hold, but I can imagine systems where it might be the case.

The in-memory representation of all pointers for a given architecture is the same regardless of the data type pointed to.

Think of member pointers vs regular pointers. They don't have the same representation (or size). A member pointer consists of a this pointer and an offset.

And as above, it is conceivable that some CPU's would load constant data into a separate area of memory, which used a separate pointer format.

The in-memory representation of a pointer is the same as an integer of the same bit length as the architecture.

Depends on how that bit length is defined. :) An int on many 64-bit platforms is still 32 bits. But a pointer is 64 bits. As already said, CPU's with a segmented memory model will have pointers consisting of a pair of numbers. Likewise, member pointers consist of a pair of numbers.

Multiplication and division of pointer data types are only forbidden by the compiler.

Ultimately, pointers data types only exist in the compiler. What the CPU works with is not pointers, but integers and memory addresses. So there is nowhere else where these operations on pointer types could be forbidden. You might as well ask for the CPU to forbid concatenation of C++ string objects. It can't do that because the C++ string type only exists in the C++ language, not in the generated machine code.

However, to answer what you mean, look up the Motorola 68000 CPUs. I believe they have separate registers for integers and memory addresses. Which means that they can easily forbid such nonsensical operations.

All pointer values can be casted to a single integer.

You're safe there. The C and C++ standards guarantee that this is always possible, no matter the memory space layout, CPU architecture and anything else. Specifically, they guarantee an implementation-defined mapping. In other words, you can always convert a pointer to an integer, and then convert that integer back to get the original pointer. But the C/C++ languages say nothing about what the intermediate integer value should be. That is up to the individual compiler, and the hardware it targets.

Incrementing a pointer is equivalent to adding sizeof(the pointed data type) to the memory address stored by the pointer.

Again, this is guaranteed. If you consider that conceptually, a pointer does not point to an address, it points to an object, then this makes perfect sense. Adding one to the pointer will then obviously make it point to the next object. If an object is 20 bytes long, then incrementing the pointer will move it 20 bytes, so that it moves to the next object.

If a pointer was merely a memory address in a linear address space, if it was basically an integer, then incrementing it would add 1 to the address -- that is, it would move to the next byte.

Finally, as I mentioned in a comment to your question, keep in mind that C++ is just a language. It doesn't care which architecture it is compiled to. Many of these limitations may seem obscure on modern CPU's. But what if you're targeting yesteryear's CPU's? What if you're targeting the next decade's CPU's? You don't even know how they'll work, so you can't assume much about them. What if you're targeting a virtual machine? Compilers already exist which generate bytecode for Flash, ready to run from a website. What if you want to compile your C++ to Python source code?

Staying within the rules specified in the standard guarantees that your code will work in all these cases.

jalf
+1  A: 

There were lots of "word addressed" architectures in the 50's, 60's and 70's. But I cannot recall any mainstream examples that had a C compiler. I recall the ICL / Three Rivers PERQ machines in the 80's that was word addressed and had a writable control store (microcode). One of its instantiations had C compiler and a flavor of UNIX called PNX, but the C compiler required special microcode.

The basic problem is that char* types on word addressed machines are awkward, however you implement them.

Interestingly, before C there was a language called BCPL in which the basic pointer type was a word address; i.e. incrementing a pointer gave you the address of the next word, and ptr!1 gave you the word at ptr + 1. There was a different operator for addressing a byte: ptr%42 if I recall.

Stephen C
+2  A: 

sizeof(char*) != sizeof(void(*)(void) ? - Not on x86 in 36 bit addressing mode (supported on pretty much every Intel CPU since Pentium 1)

"The in-memory representation of a pointer is the same as an integer of the same bit length" - there's no in-memory representation on any modern architecture; tagged memory has never caught on and was already obsolete before C was standardized. Memory in fact doesn't even hold integers, just bits and arguably words (not bytes; most physical memory doesn't allow you to read just 8 bits.)

"Multiplication of pointers is impossible" - 68000 family; address registers (the ones holding pointers) didn't support that IIRC.

"All pointers can be cast to integers" - Not on PICs.

"Incrementing a T* is equivalent to adding sizeof(T) to the memory address" - true by definition. Also equivalent to &pointer[1].

MSalters
A: 
Stephen Kellett
`int32*` means "pointer to int32". OP said nothing about the internal representation of the address here. Also, the whole concept of signed/unsigned is meaningless for addresses, since they are not to be used as numbers. The difference between addresses *is* a number, and will be correct whether you regard addresses as signed or unsigned. Actually, thinking about addresses as "signed" or "unsigned" is just an arbitrary convention. We're talking about two's complement arithmetic architectures, of course.
slacker
Sorry, I'm not not thinking clearly. Slacker, yes you are correct.I conflated the pointer with the type rather than thinking about the pointer arithmetic. I'm so tired at the moment, didn't realise I was that tired and I've only been awake an hour. :-(
Stephen Kellett
A: 

DeathStation 9000

slacker