tags:

views:

581

answers:

11

I have a function that I would like to be able to return special values for failure and uninitialized (it returns a pointer on success).

Currently it returns NULL for failure, and -1 for uninitialized, and this seems to work...but I could be cheating the system. iirc, addresses are always positive, are they not? (although since the compiler is allowing me to set an address to -1, this seems strange).

[update] Another idea I had (in the event that -1 was risky) is to malloc a char @ the global scope, and use that address as a sentinel.

+15  A: 

The valid values for a pointer are entirely implementation-dependent, so, yes, a pointer address could be negative.

More importantly, however, consider (as an example of a possible implementation choice) the case where you are on a 32-bit platform with a 32-bit pointer size. Any value that can be represented by that 32-bit value might be a valid pointer. Other than the null pointer, any pointer value might be a valid pointer to an object.

For your specific use case, you should consider returning a status code and perhaps taking the pointer as a parameter to the function.

James McNellis
Careful though, if your pointer is too negative it might end up adressing the machine next to your current one.
Noon Silk
+3  A: 

Pointers can be negative like an unsigned integer can be negative. That is, sure, in a two's-complement interpretation, you could interpret the numerical value to be negative because the most-significant-bit is on.

jamesdlin
+34  A: 

No, addresses aren't always positive - on x86_64, pointers are sign-extended and the address space is clustered symmetrically around 0 (though it is usual for the "negative" addresses to be kernel addresses).

However the point is mostly moot, since C only defines the meaning of < and > pointer comparisons between pointers that are to part of the same object, or one past the end of an array. Pointers to completely different objects cannot be meaningfully compared other than for exact equality, at least in standard C - if (p < NULL) has no well defined semantics.

You should create a dummy object with static storage duration and use its address as your unintialised value:

extern char uninit_sentinel;
#define UNINITIALISED ((void *)&uninit_sentinel)

It's guaranteed to have a single, unique address across your program.

caf
awesome that's exactly what I needed
Jared Forsyth
@caf: very nice thinking.
Paul Nathan
+1 Just to mention a modification of this idea. The dedicated sentinel has the disadvantage that you have to instantiate it in one of your objects. If you just want to have a macro you could use the address of a known system variable of which you judge to be not a valid result of your function. There are not too many such variables defined, but on a POSIX system e.g `environ` would do the trick.
Jens Gustedt
@caf: could you point to a resource to verify that pointers in amd64 are sign-extended? Never have I read anything that implies or states this. Perhaps you are referring to the requirement that canonical addresses must have bits 48 through 63 of any virtual address to be copies of bit 47? If this is what you mean, it does not imply "negative" pointers. Neither does RIP-relative addressing.
Michael Foukarakis
@mfukar: See section 3.5.1 of the x86-64 ABI ( http://www.x86-64.org/documentation/abi.pdf ), under "Kernel code model": *"The kernel of an operating system is usually rather small but runs in the negative half of the address space."*
caf
@caf: Yes, that is exactly what I was referring to. No pointer bit is interpreted as a sign in amd64. The fact that its values may start in 0xff... (in the current 48-bit addressing mode only) does not imply a sign. Please mind the wording: "sign extended reference changes" - it's only referring to immediate values, due to RIP-relative addressing. I think the answer should be edited to reflect this.
Michael Foukarakis
@mfukar: The ABI expressly refers to it as the **negative** half of the address space. I believe that is completely clear on the point, and refers to absolute addresses.
caf
Please see http://en.wikipedia.org/wiki/X86-64#Canonical_form_addresses for a visualization. Again; there is no sign bit on amd64 pointers.
Michael Foukarakis
@mfukar: Personally I find the architecture ABI to be a more authoritative document than Wikipedia. In the end, as I'm sure you know, it is ultimately a matter of interpretation or how you conceptualise the address space anyway.
caf
I was only pointing at the visualization, which is very accurate. If you wish that pointers had a sign, by all means go for it, but don't quote the amd64 ABI (the wrong parts of it, no less). Good day.
Michael Foukarakis
@mfukar: Perhaps you could enlighten us as to what the "negative half of the address space" refers to then, if not the obvious. (And at least I haven't been retconning my comments!)
caf
A: 

@James is correct, of course, but I'd like to add that pointers don't always represent absolute memory addresses, which theoretically would always be positive. Pointers also represent relative addresses to some point in memory, often a stack or frame pointer, and those can be both positive and negative.

So your best bet is to have your function accept a pointer to a pointer as a parameter and fill that pointer with a valid pointer value on success while returning a result code from the actual function.

Randolpho
Sure? The relative offset is usually an int in my experience.
Steve314
A: 

James answer is probably correct, but of course describes an implementation choice, not a choice that you can make.

Personally, I think addresses are "intuitively" unsigned. Finding a pointer that compares as less-than a null pointer would seem wrong. But ~0 and -1, for the same integer type, give the same value. If it's intuitively unsigned, ~0 may make a more intuitive special-case value - I use it for error-case unsigned ints quite a lot. It's not really different (zero is an int by default, so ~0 is -1 until you cast it) but it looks different.

Pointers on 32-bit systems can use all 32 bits BTW, though -1 or ~0 is an extremely unlikely pointer to occur for a genuine allocation in practice. There are also platform-specific rules - for example on 32-bit Windows, a process can only have a 2GB address space, and there's a lot of code around that encodes some kind of flag into the top bit of a pointer (e.g. for balancing flags in balanced binary trees).

Steve314
+13  A: 

It's generally a bad deisgn to try to multiplex special values onto a return value... you're trying to do too much with a single value. It would be cleaner to return your "success pointer" via argument, rather than the return value. That leaves lots of non-conflicting space in the return value for all of the conditions you want to describe:

int SomeFunction(SomeType **p)
{
    *p = NULL;
    if (/* check for uninitialized ... */)
        return UNINITIALIZED;
    if (/* check for failure ... */)
        return FAILURE;

    *p = yourValue;
    return SUCCESS;
}

You should also do typical argument checking (ensure that 'p' isn't NULL), but it makes the interface much cleaner.

JaredReisinger
This is absolutely the right way to design this function. Anything else will be a maintenance disaster and a bug magnet for anyone else using the code, and should be strongly dis-recommended.
Ken Bloom
Possibly. The guy who "invented" null pointers said it was a mistake, IIRC. Another special-case value may be a problem. Even so, sometimes using two separate values where one will do leads to overcomplex code. A common approach for *simplifying* some common algorithms is to assign special-case past-the-end objects, for instance, rather than use nulls - it avoids special-case null checks. Having a "valid" flag still needs those at-the-end checks, just in a different form. A valid pointer to a special object *is* a special-case pointer, and often saves a lot of complexity.
Steve314
"The guy" is C.A.R. Hoare. On the other hand, he more than made up for the "billion dollar mistake" with the invention of Quicksort :-)
James McNellis
@James - all those guys, they're just guys, you know? I probably *should* remember Hoare, but the *who* is just history. The ideas are more important. Also, I find it helps to be vague - hard for people to contradict me when they don't know who I'm quoting ;-)
Steve314
@Steve314: Yes, in a very specific context (where you can control SomeType in my example) having some common "special case" objects/pointers can work and be a little more streamlined... but in the general case separating the status and the returned object is more maintainable.
JaredReisinger
In most cases, I would agree with you. In mine, I'm not specifically *returning uninitialized*. I'm working with a lineked list, which is passed in as an argument, but which may or may not be initialized. Previously I had it set to NULL initially, but this conflicted with my returning "NULL" for *failure*. Thanks for your suggestions.
Jared Forsyth
+1  A: 

What's the difference between failure and unitialized. If unitialized is not another kind of failure, then you probably want to redesign the interface to separate these two conditions.

Probably the best way to do this is to return the result through a parameter, so the return value only indicates an error. For example where you would write:

void* func();

void* result=func();
if (result==0)
  /* handle error */
else if (result==-1)
  /* unitialized */
else
  /* initialized */

Change this to

// sets the *a to the returned object
// *a will be null if the object has not been initialized
// returns true on success, false otherwise
int func(void** a);

void* result;
if (func(&result)){
  /* handle error */
  return;
}

/*do real stuff now*/
if (!result){
  /* initialize */
}
/* continue using the result now that it's been initialized */
Ken Bloom
I'm not specifically returning uninitialized. I'm working with a linked list, which is passed in as an argument, but which may or may not be initialized. Previously I had it set to NULL initially, but this conflicted with my returning "NULL" for failure. Thanks for your suggestions.
Jared Forsyth
A: 

Actually, (at least on x86), the NULL-pointer exception is generated not only by dereferencing the NULL pointer, but by a larger range of addresses (eg, first 65kb). This helps catching such errors as

int* x = NULL;
x[10] = 1;

So, there are more addresses that are garanteed to generate the NULL pointer exception when dereferenced. Now consider this code (made compilable for AndreyT):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define ERR_NOT_ENOUGH_MEM (int)NULL
#define ERR_NEGATIVE       (int)NULL + 1
#define ERR_NOT_DIGIT      (int)NULL + 2

char* fn(int i){
    if (i < 0)
        return (char*)ERR_NEGATIVE;
    if (i >= 10)
        return (char*)ERR_NOT_DIGIT;
    char* rez = (char*)malloc(strlen("Hello World ")+sizeof(char)*2);
    if (rez)
        sprintf(rez, "Hello World %d", i);
    return rez;
};

int main(){
    char* rez = fn(3);
    switch((int)rez){
        case ERR_NOT_ENOUGH_MEM:    printf("Not enough memory!\n"); break;
        case ERR_NEGATIVE:          printf("The parameter was negative\n"); break;
        case ERR_NOT_DIGIT:         printf("The parameter is not a digit\n"); break;
        default:                    printf("we received %s\n", rez);
    };
    return 0;
};

this could be useful in some cases. It won't work on some Harvard architectures, but will work on von Neumann ones.

ruslik
I'm not sure that's true "on x86" so much as on modern operating systems. The chip provides the ability to map a process address space to a physical address space etc, but it's the OS that usually decides which parts of the process address space are valid.
Steve314
This will not even compile. Some C compilers with rather loose error checking will let you assign an integer value to a pointer (even though it is illegal in C), but none I know of will let you use a pointer as a controlling value for `switch` statement.
AndreyT
A: 

NULL is the only valid error return in this case, this is true anytime a unsigned value such as a pointer is returned. It may be true that in some cases pointes will not be large enough to use the sign bit as a data bit, however since pointers are controlled by the OS not the program I would not rely on this behavior.

Remember that a pointer is basically a 32-bit value; whether or not this is a possible negative or always positive number is just a matter of interpretation (i.e.) whether the 32nd bit is interpreted as the sign bit or as a data bit. So if you interpreted 0xFFFFFFF as a signed number it would be -1, if you interpreted it as an unsigned number it would be 4294967295. Technically, it is unlikely that a pointer would ever be this large, but this case should be considered anyway.

As far as an alternative you could use an additional out parameter (returning NULL for all failures), however this would require clients to create and pass a value even if they don't need to distinguish bettween specific errors.

Another alternative would be to use the GetLastError/SetLastError mechanism to provide additional error information (This would be specific to Windows, don't know if that is an issue or not), or to throw an exception on error instead.

Devin Ellingson

DevinEllingson
+3  A: 

The C language does not define the notion of "negativity" for pointers. The property of "being negative" is a chiefly arithmetical one, not in any way applicable to values of pointer type.

If you have a pointer-returning function, then you cannot meaningfully return the value of -1 from that function. In C language integral values (other than zero) are not implicitly convertible to pointer types. An attempt to return -1 from a pointer-returning function is an immediate constraint violation that will result in diagnostic message. In short, it is an error. If your compiler allows it, it simply means that it doesn't enforce that constraint too strictly (most of the time they do it for compatibility with pre-standard code).

If you force the value of -1 to pointer type by an explicit cast, the result of the cast will be implementation-defined. The language itself makes no guarantees about it. It might easily prove to be the same as some other, valid pointer value.

If you want to create a reserved pointer value, there no need to malloc anything. You can simple declare a global variable of the desired type and use its address as the reserved value. It is guaranteed to be unique.

AndreyT
A: 

Do not use malloc for this purpose. It might keep unnecessary memory tied up (if a lot of memory is already in use when malloc gets called and the sentinel gets allocated at a high address, for example) and it confuses memory debuggers/leak detectors. Instead simply return a pointer to a local static const char object. This pointer will never compare equal to any pointer the program could obtain in any other way, and it only wastes one byte of bss.

R..