views:

138

answers:

5

Hello,

While coding a simple function to remove a particular character from a string, I fell on this strange issue:

void str_remove_chars( char *str, char to_remove)
{
    if(str && to_remove)
    {
       char *ptr = str;
       char *cur = str;
       while(*ptr != '\0')
       {
           if(*ptr != to_remove)
           {
               if(ptr != cur)
               {
                   cur[0] = ptr[0];
               }
               cur++;
           }
           ptr++;
       }
       cur[0] = '\0';
    }
}
int main()
{
    setbuf(stdout, NULL);
    {
        char test[] = "string test"; // stack allocation?
        printf("Test: %s\n", test);
        str_remove_chars(test, ' '); // works
        printf("After: %s\n",test);
    }
    {
        char *test = "string test";  // non-writable?
        printf("Test: %s\n", test);
        str_remove_chars(test, ' '); // crash!!
        printf("After: %s\n",test);
    }

    return 0;
}

What I don't get is why the second test fails? To me it looks like the first notation char *ptr = "string"; is equivalent to this one: char ptr[] = "string";.

Isn't it the case?

+3  A: 

Strictly speaking a declaration of char *ptr only guarantees you a pointer to the character type. It is not unusual for the string to form part of the code segment of the compiled application which would be set read-only by some operating systems. The problem lies in the fact that you are making an assumption about the nature of the pre-defined string (that it is writeable) when, in fact, you never explicitly created memory for that string yourself. It is possible that some implementations of compiler and operating system will allow you to do what you've attempted to do.

On the other hand the declaration of char test[], by definition, actually allocates readable-and-writeable memory for the entire array of characters on the stack in this case.

PP
+13  A: 

The two declarations are not the same.

char ptr[] = "string"; declares a char array of size 7 and initializes it with the char 's','t','r','i','n','g','\0'. You are allowed to modify the contents of this array.

char *ptr = "string"; declares ptr as a char pointer and initializes it with address of string literal "string" which is read-only. Modifying a string literal is an undefined behavior. What you saw(seg fault) is one manifestation of UB.

codaddict
And a sizeof(ptr) will give different results too for the different declarations. The first one will return the length of the array <i>including</i> the terminating null character. The second will return the length of a pointer, usually 4 or 8.
Amigable Clark Kant
It's also true in the second place that the contents of ptr can be changed. But the contents are the pointer to the literal, not the characters.
Darron
+1, great answer. It is also true and important to understand that with `char *ptr = "string";` the `ptr` can be pointed at something else and can therefor be 'changed' in what it is pointing at but the characters `"string"` is a literal and cannot change.
drewk
It would also be worth mentioning the performance issues. Declaring an initialized automatic array variable will fill the entire array contents every time the variable comes into scope. Declaring an initialized automatic pointer variable will simply assign the pointer (a single word write) when the variable comes into scope. If the string is long or the block is entered often (like each iteration of a loop), the difference could be very significant!
R..
+1  A: 

As far as I remember

char ptr[] = "string";

creates a copy of "string" on the stack, so this one is mutable.

The form

char *ptr = "string";

is just backwards compatibility for

const char *ptr = "string";

and you are not allowed (in terms of undefined behavior) to modify it's content. The compiler may place such strings in a read only section of memory.

DerKuchen
+2  A: 

char *test = "string test"; is wrong, it should have been const char*. This code compiles just because of backward comptability reasons. The memory pointed by const char* is a read-only memory and whenever you try to write to it, it will invoke undefined behavior. On the other hand char test[] = "string test" creates a writable character array on stack. This like any other regualr local variable to which you can write.

Naveen
I wouldn't go so far as to say it's wrong. You might want to later have `test` point to a modifiable string, and keep a flag (in another variable) indicating whether it's been replaced with something modifiable. Still, in most cases it's probably good practice to use `const` there.
R..
A: 

Good answer @codaddict.

Also, a sizeof(ptr) will give different results for the different declarations.

The first one, the array declaration, will return the length of the array including the terminating null character.

The second one, char* ptr = "a long text..."; will return the length of a pointer, usually 4 or 8.

Amigable Clark Kant