views:

78

answers:

2

The standard C library functions strtof and strtod have the following signatures:

float strtof(const char *str, char **endptr);
double strtod(const char *str, char **endptr); 

They each decompose the input string, str, into three parts:

  1. An initial, possibly-empty, sequence of whitespace
  2. A "subject sequence" of characters that represent a floating-point value
  3. A "trailing sequence" of characters that are unrecognized (and which do not affect the conversion).

If endptr is not NULL, then *endptr is set to a pointer to the character immediately following the last character that was part of the conversion (in other words, the start of the trailing sequence).

I am wondering: why is endptr, then, a pointer to a non-const char pointer? Isn't *endptr a pointer into a const char string (the input string str)?

+1  A: 

The reason is simply usability. char * can automatically convert to const char *, but char ** cannot automatically convert to const char **, and the actual type of the pointer (whose address gets passed) used by the calling function is much more likely to be char * than const char *. The reason this automatic conversion is not possible is that there is a non-obvious way it can be used to remove the const qualification through several steps, where each step looks perfectly valid and correct in and of itself. Steve Jessop has provided an example in the comments:

if you could automatically convert char** to const char**, then you could do

char *p;
char **pp = &p;
const char** cp = pp;
*cp = (const char*) "hello";
*p = 'j';.

For const-safety, one of those lines must be illegal, and since the others are all perfectly normal operations, it has to be cp = pp;

A much better approach would have been to define these functions to take void * in place of char **. Both char ** and const char ** can automatically convert to void *. Alternatively, these functions could have taken a ptrdiff_t * or size_t * argument in which to store the offset of the end, rather than a pointer to it. This is often more useful anyway.

If you like either of these approaches, feel free to write such a wrapper around the standard library functions and call your wrapper, so as to keep the rest of your code const-clean and cast-free.

R..
"Both `char**` and `const char**` can automatically convert to void *." I see what you're saying, and using `void*` would allow the user to pass in a `const char**` without a cast. But it would also allow them to pass in a whole universe of wrong things without a cast. Given the limitations of C, I think it's better to preserve basic type-safety, even the cost of losing const-safety.
Steve Jessop
It is not simply usability, these functions store result in that pointer, and you cannot write to a constant memory.
Vlad Lazarenko
@Vlad: You're confusing `const char **` with `char *const *`.
R..
"If anyone has it, please post." - if you could automatically convert `char**` to `const char**`, then you could do `char *p; char **pp = const char** cp = pp; *cp = (const char*) "hello"; *p = 'j';`. For const-safety, one of those lines must be illegal, and since the others are all perfectly normal operations, it has to be `cp = pp;`.
Steve Jessop
@Steve: That is an extremely helpful example for why `char**` cannot convert implicitly to `const char**`!
Daniel Trebbien
@R.: I like your idea to pass in a pointer to `size_t`. It seems more correct to me than the standard library's solution.
Daniel Trebbien
An offset certainly addresses the const problem, although it's too late now to consistently use that convention throughout the libraries, and it might already have been too disruptive by the time `const` was invented, I'm not sure. If it was normal for string-handling functions to return offsets rather than pointers, would `strcpy` always return 0? ;-)
Steve Jessop
Ideally `strcpy` would return the length of the string, a very useful piece of information it automatically obtains (and then throws away) as a side effect of its operation.. :-)
R..
+3  A: 

Usability. The str argument is marked as const because the input argument will not be modified. If endptr were const, then that would instruct the caller that he should not change data referenced from endptr on output, but often the caller wants to do just that. For example, I may want to null-terminate a string after getting the float out of it:

float StrToFAndTerminate(char *Text) {
    float Num;

    Num = strtof(Text, &Text);
    *Text = '\0';
    return Num;
}

Perfectly reasonable thing to want to do, in some circumstances. Doesn't work if endptr is of type const char **.

Ideally, endptr should be of const-ness matching the actual input const-ness of str, but C provides no way of indicating this through its syntax. (Anders Hejlsberg talks about this when describing why const was left out of C#.)

Aidan Cully
The same effect could easily be achieved with `Text[FloatEnd-Text] = '\0'` so for me this isn't really a good excuse. The method that you are indicating will produce UB if `Text` would be declared with `const` and you couldn't easily read that from the location where you do the assignment.
Jens Gustedt
@Jens: I've updated the code to better reflect the rationale. And I agree about the undefined behavior problem. The question is whether the cure is worse than the disease...
Aidan Cully
"could easily be achieved" - but what's premature optimisation to us (preferring to the pointer directly, instead of relying on the compiler to sort out that expression, and/or the hardware to be so fast we don't care), wasn't premature optimisation to the C89 standards committee when they specified `strtod`. Also, that's a shameful idiom to have to learn, so we're just left wondering whether it's more or less shameful than const-incorrectness ;-)
Steve Jessop
@Steve: hm, premature optimization, in terms of arithmetic this is just `Text+FloatEnd-Text`, this is not really difficult to optimize by a compiler, no ?-)
Jens Gustedt
@Aidan: hope you took no offense, I believe that the motivation might have been as you are suggesting, but I would blame that on you ;-)
Jens Gustedt
@Jens: it's a different thing for you or me to say, "oh, it's a bit ugly, but I can live with that and I expect the compiler will optimize", than for the standards committee to say, "if people think it's ugly they can either live with it or cast the pointer to non-const once they have it back, and if their compiler doesn't optimize they can complain to the compiler-writer". We make our own decisions, but the standards committee makes decisions for (and is criticised by) everyone. There were probably still some pretty rudimentary compilers around in 1989, not much more than assemblers.
Steve Jessop
@Steve: sure, again I am a bit worried about your reaction. Reasons in 1989 were probably different from what they are today, and perhaps they had not been aware of the importance that the little keyword `const` would gain. The standard has been revised since, and the bogus interface stuck for `strtof`, well. Then, this interface has been transposed to the new interfaces in C99. It is just a pity, don't want to say more than that.
Jens Gustedt