tags:

views:

264

answers:

9

Here's a simple example of a program that concatenates two strings.

#include <stdio.h>

void strcat(char *s, char *t);

void strcat(char *s, char *t) {
    while (*s++ != '\0');
    s--;
    while ((*s++ = *t++) != '\0');
}

int main() {
    char *s = "hello";
    strcat(s, " world");
    while (*s != '\0') {
        putchar(*s++);
    }
    return 0;
}

I'm wondering why it works. In main(), I have a pointer to the string "hello". According to the K&R book, modifying a string like that is undefined behavior. So why is the program able to modify it by appending " world"? Or is appending not considered as modifying?

+17  A: 

Undefined behavior means a compiler can emit code that does anything. Working is a subset of undefined.

MSN
You took the words right out of my mouth.
Frank Krueger
You mean I got lucky, right?
Ree
I don't use the word luck, but yes. The fact that C/C++ hide these kinds of bugs is what makes programming in those languages so exciting!
Frank Krueger
I consider this unlucky. In my experience it will work on my workstation, and then fail spectacularly at a customer site. I prefer an early failure.
Darron
It is actually the most dangerous subset of undefined, because it might stop working tomorrow.
starblue
+1  A: 

You were lucky this time.
Especially in debug mode some compilers will put spare memory (often filled with some obvious value) around declarations so you can find code like this.

Martin Beckett
A: 

The compiler is allowing you to modify s because you have improperly marked it as non-const -- a pointer to a static string like that should be

const char *s = "hello";

With the const modifier missing, you've basically disabled the safety that prevents you from writing into memory that you shouldn't write into. C does very little to keep you from shooting yourself in the foot. In this case you got lucky and only grazed your pinky toe.

Crashworks
"Improperly", you say. Maybe, I'm only reading chapter 5 of the book and there hasn't been any mention of the "const" keyword, however, the declaration example the authors give is identical to mine (without const).
Ree
So don't rely completely on that book. A string literal is type "const char *", regardless of what your book says, and changing it is undefined behavior that happens to work. A book that encourages what you're experimenting with here is dangerous.
David Thornley
It's completely the opposite - the book encourages NOT to modify string literals. I did the experiment myself hoping it wouldn't work.
Ree
@David: string literals don't have the type `const char *` - according to the spec, they are "used to initialize an array of static storage duration and length just sufficient to contain the sequence"
Christoph
The trouble with that definition is that in practice the array of static storage is usually in a read-only data segment, so writing to it will result in an immediate access violation and crash.
Crashworks
+1  A: 

It also depends on the how the pointer is declared. For example, can change ptr, and what ptr points to:

char * ptr;

Can change what ptr points to, but not ptr:

char const * ptr;

Can change ptr, but not what ptr points to:

const char * ptr;

Can't change anything:

const char const * ptr;
Paul Beckingham
+4  A: 

I +1'd MSN, but as for why it works, it's because nothing has come along to fill the space behind your string yet. Declare a few more variables, add some complexity, and you'll start to see some wackiness.

overslacked
A: 

s points to a bit of memory that holds "hello", but was not intended to contain more than that. This means that it is very likely that you will be overwriting something else. That is very dangerous, even though it may seem to work.

Two observations:

  1. The * in *s-- is not necessary. s-- would suffice, because you only want to decrement the value.
  2. You don't need to write strcat yourself. It already exists (you probably knew that, but I'm telling you anyway:-)).
Renze de Waal
1 - You are right.2 - Yes, I do know that, writing this function was one of the exercises in the book.
Ree
+1  A: 

I'm wondering why it works

It doesn't. It causes a Segmentation Fault on Ubuntu x64; for code to work it shouldn't just work on your machine.

Moving the modified data to the stack gets around the data area protection in linux:

int main() {
    char b[] = "hello";
    char c[] = " ";
    char *s = b;

    strcat(s, " world");

    puts(b);
    puts(c);

    return 0;
}

Though you then are only safe as 'world' fits in the unused spaces between stack data - change b to "hello to" and linux detects the stack corruption:

*** stack smashing detected ***: bin/clobber terminated
Pete Kirkham
It works on some machines, and not on others. His is a "works", and yours is a "doesn't".
Jonathan Leffler
I was going to put something below it on the stack and demonstrate it getting walked over, but couldn't
Pete Kirkham
@Pete: the string literals don't reside on the stack, but the data section of the executable
Christoph
True. But it's easier to show something being walked over using a string on the stack.
Pete Kirkham
+2  A: 

Perhaps surprisingly, your compiler has allocated the literal "hello" into read/write initialized data instead of read-only initialized data. Your assignment clobbers whatever is adjacent to that spot, but your program is small and simple enough that you don't see the effects. (Put it in a for loop and see if you are clobbering the " world" literal.)

It fails on Ubuntu x64 because gcc puts string literals in read-only data, and when you try to write, the hardware MMU objects.

Norman Ramsey
+1  A: 

According to the C99 specifification (C99: TC3, 6.4.5, §5), string literals are

[...] used to initialize an array of static storage duration and length just sufficient to contain the sequence. [...]

which means they have the type char [], ie modification is possible in principle. Why you shouldn't do it is explained in §6:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

Different string literals with the same contents may - but don't have to - be mapped to the same memory location. As the behaviour is undefined, compilers are free to put them in read-only sections in order to cleanly fail instead of introducing possibly hard to detect error sources.

Christoph