tags:

views:

1331

answers:

7

Consider this definition:

char *pmessage = "now is the time";

As I see it, pmessage will point to a contiguous area in the memory containing these characters and a '\0' at the end. So I derive from this that I can use pointer arithmetic to access an individual character in this string as long as I'm in the limits of this area.

So why they say (K&R) that modifying an individual character is undefined?
Moreover, why when I run the following code, I get a "Segmentation Fault"?

*(pmessage + 1) = 'K';
+1  A: 

You can use pointer arithmetic to read from a string literal, but not to write to it. The C Standard forbids modifying string literals.

anon
This may be so, but it is not the compiler complaining, it's the memory protection complaining. I doubt that he would have a problem running this on some embedded non protected mode hardware.
AndreasT
So what? It would still be undefined behaviour as far as the C Standard is concerned.
anon
Yes, but on the systems I learned C on (CP/M, Mac OS 6 or so) it would just work (for some values of work), and certainly wouldn't have a memory protection issue.
David Thornley
"just work" is often how undefined behaviour manifests itself :-)
anon
Some compilers let you store said strings in R/W storage, although I'd never use that feature. IBM XLC lets you do this: http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=/com.ibm.xlcpp8a.doc/compiler/ref/rnpgstrg.htm
Nighthawk
+1  A: 

The "string" literal is defined in read only memory, so you shouldn't be modifying it.

sfossen
+13  A: 

String literals in C are not modifiable. A string literal is a string that is defined in the source code of your program. Compilers will frequently store string literals in a read-only portion of the compiled binary, so really your pmessage pointer is into this region that you cannot modify. Strings in buffers that exist in modifiable memory can be modified using the syntax above.

Try something like this.

const char* pmessage = "now is the time";

// Create a new buffer that is on the stack and copy the literal into it.
char buffer[64];
strcpy(buffer, pmessage);

// We can now modify this buffer
buffer[1] = 'K';

If you just want a string that you can modify, you can avoid using a string literal with the following syntax.

char pmessage[] = "now is the time";

This method directly creates the string as an array on the stack and can be modified in place.

bradtgmurray
Shouldn't that be "char * buffer = malloc(512);"?
David Thornley
Yup, I originally had char buffer[] = new char[512]; and converted it incorrectly. Thanks.
bradtgmurray
@David: yes, that should be char *buffer, or many other things than an incomplete type as shown. Better would be 'char buffer[] = "now is the time";' and no malloc() so no leak - and no string copy.
Jonathan Leffler
Technically, with the array approach, wouldn't you be copying the literal onto the stack at some point, hence a string copy? I'll modify the example the get rid of the dynamic memory use.
bradtgmurray
The compiler would initialize the variable - yes, so there would be a string copy somewhere. But it would not overflow bounds, etc. It's now a good answer - well done.
Jonathan Leffler
I like this so much ...Q: Why can't I do XA: You can do Y, where Y has nothing at all to do with X.Though, the first paragraph of your answer was ok.
Ingo
I agree with Ingo. I was was actually considering downvoting it (not because the second part is so bad per se, but because 12 is a too high rank for it), but the first part has some goodness to it which wins, so I will just leave it as is.
hlovdal
+9  A: 

The string is a constant and cannot be modified. If you want to modify it, you can do:

char pmessage[] = "now is the time";

This initializes an array of characters (including the \0) instead of creating a pointer to a string constant.

Dana
This will have the same problem as the original example in the question.
Scott Langham
Without actually looking it up, I don't think so. It will be the equivalent of having a char pmessage[16] initialized with "now is the time" (the individual chars followed by the \0). I think it works.
David Thornley
@David That's how I understood it to work, too. And I don't have access to a C compiler to try it :P I'll let my ignorance stand for now, if ignorance it is :P
Dana
James Curran
Dana and David are right.
Mike Dunlavey
Sweet! I like being correct!
Dana
char pmessage[] is not same as the problem mentioned in question. pmessage here is read-write buffer. The answer is absolutely fine.
aJ
:) oops. I concede! Quite right.
Scott Langham
A: 

When you write: char *pmessage = "now is the time";

The compiler treats it as if you wrote:

 const char internalstring[] = "now is the time";
 char *pmessage = internalstring;

The reason why you cannot modify the string, is because if you were to write:

 char *pmessage1 = "now is the time";
 char *pmessage2 = "now is the time";

The compiler will treat it as if you wrote:

 const char internalstring[] = "now is the time";
 char *pmessage1 = internalstring;
 char *pmessage2 = internalstring;

So, if you were to change one, you'd change both.

James Curran
The compiler need not do that, but it may. It doesn't have to put internalstring into read-only memory either. It doesn't matter to the Standard, because once you try to modify that you're into undefined behavior, and anything the compiler does is OK.
David Thornley
In addition, I think your example is not "const" clean. I am not sure it would even compile. Surely, the pointer should point to const char, otherwise the whole const-thing makes no sense.
Ingo
@Ingo - true, that would not, if written just like that. However, it is nevertheless what is happening internally. A literal string is a const char[], but with an implied conversion to (non-const) char*. The implied conversion is only for literal strings.
James Curran
@David: Sorry, I didn't mean to imply that the complier HAD to do that, only that it COULD do that, to explain why modifying the array was declared undefined behavior.
James Curran
+1  A: 

The literal value of pmessage goes into code, and in most cases they are placed in code memory. Which is read only

Alphaneo
A: 

If you define a literal of the form:

char* message = "hello world";

the compiler will treat the characters as constant and may well put them in read-only memory.

So, it is advisable to use the const keyword so that any attempt to change the literal will be prevent the program from compiling:

const char* message = "hello world";


I' guessing the reason const on a literal is not enforced as part of the language is just for backwards compatibility with pre-standard versions of C where the const keyword didn't exist. Anybody know any better?

Scott Langham