views:

671

answers:

12

I understand the purpose of the NULL constant in C/C++, and I understand that it needs to be represented some way internally.

My question is: Is there some fundamental reason why the 0-address would be an invalid memory-location for an object in C/C++? Or are we in theory "wasting" one byte of memory due to this reservation?

+6  A: 

There is no requirement that a null pointer be equal to the 0-address, it's just that most compilers implement it this way. It is perfectly possible to implement a null pointer by storing some other value and in fact some systems do this. The C99 specification §6.3.2.3 (Pointers) specifies only that an integer constant expression with the value 0 is a null pointer constant, but it does not say that a null pointer when converted to an integer has value 0.

An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

On some embedded systems the zero memory address is used for something addressable.

Mark Byers
+1 for consideration of embedded systems.
Thomas Matthews
+1 for technical accuracy
Johannes Schaub - litb
+14  A: 

The null pointer does not actually have to be 0. It's guaranteed in the C spec that when a constant 0 value is given in the context of a pointer it is treated as null by the compiler, however if you do

char *foo = (void *)1;
--foo;
// do something with foo

You will access the 0-address, not necessarily the null pointer. In most cases this happens to actually be the case, but it's not necessary, so we don't really have to waste that byte. Although, in the larger picture, if it isn't 0, it has to be something, so a byte is being wasted somewhere

Edit: Edited out the use of NULL due to the confusion in the comments. Also, the main message here is "null pointer != 0, and here's some C/pseudo code that shows the point I'm trying to make." Please don't actually try to compile this or worry about whether the types are proper; the meaning is clear.

bobDevil
`void *foo = 1` is not valid in neither C nor C++. Must be `void *foo = (void *) 1`.
AndreyT
Theoretically, there doesn't *have* to be a null pointer set aside anywhere. But then programmers would have to have `isValid` flags associated with all their pointers instead. So the options are to have a byte wasted somewhere, or lots of bytes wasted everywhere.
Dennis Zickefoose
Your code snippet invokes undefined behavior, but I won't downvote because the meaning is clear.
jalf
Alternately, `int z = 0; void *foo = (void *)z;` won't necessarily produce a null pointer (at least in C++; I'm not as up on the C standards).
David Thornley
@Dennis Zickefoose: Apparently there are a couple architectures which use a high bit in the pointer as a flag marking it as null or invalid. There are many ways in which ISAs approach it.
greyfade
Nitpick: `NULL` (in all caps) means the preprocessor macro (which is, by definition, 0 (or possibly `(void*) 0` for some C implementations). You mean *the null pointer*. http://c-faq.com/null/varieties.html
jamesdlin
The size of `char` is guaranteed to be 1 *C-byte*, but not guaranteed to be 1 *machine byte*. This means that `(void *) 1` is not necessarily a valid value for a `char *` pointer and that decrement will not necessarily produce a zero address. The implementation might be built around two-machine-bytes `char`s, meaning that the above might easily produce something like `0xFFFF...` instead of zero address.
AndreyT
A: 

The memory at that address is reserved for use by the operating system. 0 - 64k is reserved. 0 is used as a special value to indicate to developers "not a valid address".

yodaj007
Not necessarily, in embedded systems (which may or may not have an OS) all bytes within the pointer's range are possible. Valid addresses are defined by the implementation. The address 0 may be valid and may be in RAM and needs to have code copied there.
Thomas Matthews
Ah, yes, true. I was thinking strictly from a Windows POV.
yodaj007
+7  A: 

This has nothing to do with wasting memory and more with memory organization.

When you work with the memory space, you have to assume that anything not directly "Belonging to you" is shared by the entire system or illegal for you to access. An address "belongs to you" if you have taken the address of something on the stack that is still on the stack, or if you have received it from a dynamic memory allocator and have not yet recycled it. Some OS calls will also provide you with legal areas.

In the good old days of real mode (e.g., DOS), all the beginning of the machine's address space was not meant to be written by user programs at all. Some of it even mapped to things like I/O. For instance, writing to the address space at 0xB800 (fairly low) would actually let you capture the screen! Nothing was ever placed at address 0, and many memory controller would not let you access it, so it was a great choice for NULL. In fact, the memory controller on some PCs would have gone bonkers if you tried writing there.

Today the operating system protects you with a virtual address space. Nevertheless, no process is allowed to access addresses not allocated to it. Most of the addresses are not even mapped to an actual memory page, so accessing them will trigger a general protection fault or the equivalent in your operating system. This is why 0 is not wasted - even though all the processes on your machine "have an address 0", if they try to access it, it is not mapped anywhere.

Uri
This is only peripherally related to the issue, but wasn't page 0 reserved for the interrupt table? If so, then 0 would be a valid address.
Steven Sudit
@Steven On the x86, only in real mode. In protected mode, the physical location of the interrupt table is specified in the IDTR register. A reasonable kernel would never map a page of an unprivileged process's virtual memory onto the interrupt table.
@user168715:Which is one of many good reasons why Windows 9x/Me kernels can't be called reasonable...
slacker
@user: That's what I thought. For some time now, I've been able to count on the OS to force an exception whenever a NULL (or otherwise invalid) pointer was dereferenced. I guess I'd forgotten the bad old days when a wild pointer could corrupt the entire machine's state. I don't miss those days!
Steven Sudit
@slacker: In their defense, MS wasn't trying to be reasonable. Rather it wanted to ensure backward compatibility at any cost, even sanity.
Steven Sudit
@Steven Sudit:Sanity was always a very scarce resource in the Microsoft world (this only recently slowly starts to change). And this all still doesn't change the fact that the Windows 9x design is one huge WTF to anyone versed in OS engineering. Here, have a "protected" OS which goes to great lengths to forbid access to other user-mode processes' memory, while happily allowing everyone write access to kernel memory. **WTF?**
slacker
@slacker: Again, I fully agree that it's unreasonable. But it fulfilled their goal, which was to continue to be able to run those wacky DOS games. Good thing modern CPU's are capable of virtualization.
Steven Sudit
The color text mode video memory is actually mapped to the real mode *segment* `0xB800`, which is actually quite a *high* address in real mode (it is above the famous "640k limit").
caf
@Steven Sudit:DOS games ran in a partially virtualized environment, without any sane way to access the kernel. It is about the **Win32** processes having kernel memory mapped in their address spaces with write permission. WTH does this have to do with compatibility?
slacker
@slacker: You keep bringing up sanity, but I don't think that ever entered the picture; market necessity is all that ever mattered. DOS games and even utilities ran in a variety of quirky environments and often depended upon directly hooking into the interrupt lookup and replacing entries. This pretty much forces page 0 to be exposed, unless you can fully virtualize it (as we do now).
Steven Sudit
+5  A: 

The zero address and the NULL pointer are not (necessarily) the same thing. Only a literal zero is a null pointer. In other words:

char* p = 0; // p is a null pointer

char* q = 1;
q--; // q is NOT necessarily a null pointer

Systems are free to represent the null pointer internally in any way they choose, and this representation may or may not "waste" a byte of memory by making the actual 0 address illegal. However, a compiler is required to convert a literal zero pointer into whatever the system's internal representation of NULL is. A pointer that comes to point to the zero address by some way other than being assigned a literal zero is not necessarily null.

Now, most systems do use 0 for NULL, but they don't have to.

Tyler McHenry
+3  A: 

On many processors address zero is the reset vector, wherein lies the bootrom (BIOS on a PC), so you are unlikely to be storing anything at that physical address. On a processor with an MMU and a supporting OS, the physical and logical address addresses need not be the same, and the address zero may not be a valid logical address in the executing process context.

Clifford
+4  A: 

It is not necessarily an illegal memory location. I have stored data by dereferencing a pointer to zero... it happens the datum was an interrupt vector being stored at the vector located at address zero.

By convention it is not normally used by application code since historically many systems had important system information starting at zero. It could be the boot rom or a vector table or even unused address space.

Amardeep
A: 

But since modern operating systems can map the physical memory to logical memory addresses (or better: modern CPUs starting with the 386), not even a single byte is wasted.

Daniel
If I had three GB of RAM and a 32-bit OS, it would waste a whole 4096 byte page.
kmm
@kmm No. It's just that the bottom page of the virtual address space of each process is not mapped. The actual physical page 0 can still be used (assuming it is RAM) and can be mapped anywhere into any of the processes' address space.
JeremyP
The virtual address space of a 32 bit OS is only 3 GB for each process. If I have 3 GB RAM, and there needs to be at least one NULL pointer, somewhere, you're wasting a page of memory.
kmm
+1  A: 

NULL is typically the zero address, but it is the zero address in your applications virtual address space. The virtual addresses that you use in most modern operating systems have exactly nothing to do with actual physical addresses, the OS maps from the virtual address space to the physical addresses for you. So, no, having the virtual address 0 representing NULL does not waste any memory.

Read up on virtual memory for a more involved discussion if you're curious.

Donnie
This only works on those OS's that support a virtual address range. On embedded systems, the address 0 is valid (especially if there is something located there, like RAM or a UART).
Thomas Matthews
A: 

As people already have pointed out, the bit representation of the NULL pointer has not to be the same as the bit represention of a 0 value. It is though in nearly all cases (the old dinosaur computers that had special addresses can be neglected) because a NULL pointer can also be used as a boolean and by using an integer (of suffisent size) to hold the pointer value it is easier to represent in the common ISAs of modern CPU. The code to handle it is then much more straight forward, thus less error prone.

tristopia
A: 

You are correct in noting that the address space at 0 is not usable storate for your program. For a number of reasons a variety of systems do not consider this a valid address space for your program anyway.

Allowing any valid address to be used would require a null value flag for all pointers. This would exceed the overhead of the lost memory at address 0. It would also require additional code to check and see if the address were null or not, wasting memory and processor cycles.

Ideally, the address that NULL pointer is using (usually 0) should return an error on access. VAX/VMS never mapped a page to address 0 so following the NULL pointer would result in a failure.

BillThor
+1  A: 

I don't see the answers directly addressing what i think you were asking, so here goes:

Yes, at least 1 address value is "wasted" (made unavailable for use) because of the constant used for null. Whether it maps to 0 in linear map of process memory is not relevant.

And the reason that address won't be used for data storage is that you need that special status of the null pointer, to be able to distinguish from any other real pointer. Just like in the case of ASCIIZ strings (C-string, NUL-terminated), where the NUL character is designated as end of character string and cannot be used inside strings. Can you still use it inside? Yeah but that will mislead library functions as of where string ends.

I can think of at least one implementation of LISP i was learning, in which NIL (Lisp's null) was not 0, nor was it an invalid address but a real object. The reason was very clever - the standard required that CAR(NIL)=NIL and CDR(NIL)=NIL (Note: CAR(l) returns pointer to the head/first element of a list, where CDR(l) returns ptr to the tail/rest of the list.). So instead of adding if-checks in CAR and CDR whether the pointer is NIL - which will slow every call - they just allocated a CONS (think list) and assigned its head and tail to point to itself. There! - this way CAR and CDR will work and that address in memory won't be reused (because it is taken by the object devised as NIL)

ps. i just remembered that many-many years ago i read about some bug of Lattice-C that was related to NULL - must have been in the dark MS-DOS segmentation times, where you worked with separate code segment and data segment - so i remember there was an issue that it was possible for the first function from a linked library to have address 0, thus pointer to it will be considered invalid since ==NULL

Nas Banov
Thanks for the interesting answer. However, `char *foo = (void *)1;--foo; // do something with foo` seems to indicate that the addres is not actually wasted after all?
aioobe
@aioobe - it is more the case of "you shouldn't", not "you can't". It depends on the compiler/environment - in some runtime de-referencing of NULL will be detected and prevented. Not to mention that use of memory not provided by malloc/OS is heresy! :-)
Nas Banov