tags:

views:

145

answers:

6

Is it possible to create a modifiable string literal in C++? For example:

char* foo[] = {
    "foo",
    "foo"
};
char* afoo = foo[0];
afoo[2] = 'g'; // access violation

This produces an access violation because the "foo"s are allocated in read only memory (.rdata section I believe). Is there any way to force the "foo"s into writable memory (.data section)? Even via a pragma would be acceptable! (Visual Studio compiler)

I know I can do strdup and a number of other things to get around the problem, but I want to know specifically if I can do as I have asked. :)

+9  A: 

Since this is C++, the "best" answer would be to use a string class (std::string, QString, CString, etc. depending on your environment).

To answer your question directly, you're not supposed to modify string literals. The standard says this is undefined behavior. You really do need to duplicate the string one way or another, otherwise you're writing incorrect C++.

Cogwheel - Matthew Orlando
+1 for `std::string`, -1 for weird nonstandard strings.
Mike Seymour
@Mike Seymour: When in Rome...
Cogwheel - Matthew Orlando
To elaborate, if you're using a framework that has its own string class, you're just going to make your life more difficult by trying to fit `std::string` into the mix. Hence "depending on your environment"
Cogwheel - Matthew Orlando
+2  A: 

If you store your string in an array you can change it.

There's no way to 'correctly' write to read-only memory.

You could, of course, stop using C-strings.

Noah Roberts
+2  A: 

I think the closest you can come is to initialize a plain char[] (not a char*[]) with a literal:

char foo[] = "foo";

That'll still perform a copy at some point though.

The only other way around that would be to use system level calls to mark the page that a string literal resides in as writeable. At that point you're not really talking about C or C++, you're really talking about Windows (or whatever system you're running on). It's probably possible on most systems (unless the data is really in ROM, which might be the case on an embedded system for example), but I sure don't know the details.

Oh, and don't forget that in your example:

char* foo[] = {
    "foo",
    "foo"
};

Since the standard (C99 6.4.5/6 "String literals") says:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values.

There's no certainty about whether the 2 pointers in that array will point to the same or separate objects. Nearly all compilers will have those pointers point to the same object at the same address, but they don't have to and some more complicated situations of pointers to string literals might have the compiler coming up with 2 separate identical strings.

You could even have a scenario where one string literal exists 'inside' another:

char* p1 = "some string";
char* p2 = "string";

p2 may well be pointing at the tail end of the string pointed to by p1.

So if you start changing string literals by some hack you can perform on a system, you may end up modifying some 'other' strings unintentionally. That's one of the things that undefined behavior can bring along.

Michael Burr
A: 

I would not do this. Therefore, I can only provide a nasty ugly hack you could try out: Get the page where your constant literal resides and unprotect that page. See VirtualProtect() function for Win32. However, even if this works, it will not guarantee the correct behavior all the time. Better don't do it.

zerm
A: 

You could create a multidimensional array of chars:

#include <iostream>

int main(int argc, char** argv)
{
    char foo[][4] = {
        "foo",
        "bar"
    };
    char* afoo = foo[0];
    afoo[2] = 'g';
    std::cout << afoo << std::endl;
}

More verbose way to define the array:

char foo[][4] = {
    {'f', 'o', 'o', '\0'},
    {'b', 'a', 'r', '\0'}
};
Ferdinand Beyer
You can still use string literals, you don't have to specify all the chars as char literals.
Noah Roberts
@Noah: Thanks for pointing that out.
Ferdinand Beyer
Also, Anne should keep in mind that if she does it this way the 4 has to be the size of the longest string +1. Using a string class is really the better option but this is as close to what she seems to want as she'll get.
Noah Roberts
I marked this as correct because it answers what I am trying to do (I realise I shouldn't be trying to do it but, heh). Why it works is another matter...
Anne
Doing this evokes undefined behavior. You cannot modify string literals.
John Dibling
It actually doesn't do what you're trying to do. It's still copying the string, but that fact is obscured by the array initialization syntax.
Cogwheel - Matthew Orlando
@John: The literal is just used as a short notation to initialize the `char[4]`. Not the literal is modified, but the `char[2][4]` array on the stack, which is perfectly valid.
Ferdinand Beyer
@Cogwheel: That depends on the compiler. The assembly code generated by GCC uses one `movl` command to copy the 4-byte string on the stack for the literal version, whereas it needs 4 `movb` commands for the verbose version.
Ferdinand Beyer
Well, semantically speaking, anyway.
Cogwheel - Matthew Orlando
Ok, I'm a bit confused. This solution does enable me to update the elements of foo, so it solves my problem. However, in my real application, there are hundreds of elements in foo. So, am I right in thinking (from these comments) that the "foo" and "bar" literals are created, plus the array? (The array is global in my application so it won't be created on the stack, meaning that the solution is using twice as much memory as I'd hoped.)
Anne
That will probably depend on the compiler. As Ferdinand pointed out, GCC emits code that directly loads the values into the array rather than copying from a string literal. This means having extra storage for the literal would be unnecessary unless you initialize more than one array with the same literals.
Cogwheel - Matthew Orlando
A: 

Yes.

   (char[]){"foo"}
John
This evokes undefined behavior.
John Dibling