tags:

views:

932

answers:

6

I always try to avoid to return string literals, because I fear they aren't defined outside of the function. But I'm not sure if this is the case. Let's take, for example, this function:


const char *
return_a_string(void)
{
    return "blah";
}

Is this correct code? It does work for me, but maybe it only works for my compiler (gcc). So the question is, do (string) literals have a scope or are they present/defined all the time.

+12  A: 

This code is fine across all platforms. The string gets compiled into the binary as a static string literal. If you are on windows for example you can even open your .exe with notepad and search for the string itself.

Since it is a static string literal scope does not matter.

String pooling:

One thing to look out for is that in some cases, identical string literals can be "pooled" to save space in the executable file. In this case each string literal that was the same could have the same memory address. You should never assume that it will or will not be the case though.

In most compilers you can set whether or not to use static string pooling for stirng literals.

Maximum size of string literals:

Several compilers have a maximum size for the string literal. For example with VC++ this is approximately 2,048 bytes.

Modifying a string literal gives undefined behavior:

Modifying a string literal should never be done. It has an undefined behavior.

char * sz = "this is a test";
sz[0] = 'T'; //<--- undefined results

Wide string literals:

All of the above applies equally to wide string literals.

Example: L"this is a wide string literal";

The C++ standard states: (section lex.string)

1 A string literal is a sequence of characters (as defined in lex.ccon) surrounded by double quotes, optionally beginning with the letter L, as in "..." or L"...". A string literal that does not begin with L is an ordinary string literal, also referred to as a narrow string literal. An ordinary string literal has type "array of n const char" and static storage duration (basic.stc), where n is the size of the string as defined below, and is initialized with the given characters. A string literal that begins with L, such as L"asdf", is a wide string literal. A wide string literal has type "array of n const wchar_t" and has static storage duration, where n is the size of the string as defined below, and is initialized with the given charac- ters.

2 Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.

Brian R. Bondy
Since you explicitly mentioned wide string literals, do every literals behave that way? With every I mean the c99 compound literals.
quinmars
I can't think of any other literals that this could apply to, but there could be others. Other literals would have their own rules, but relating to your original question, if you returned for example an integer, then yes that is safe.
Brian R. Bondy
What I meant are those C99 constructs where you can create a structure or an array on the fly like (int[]){1, 2, 3, 4}. I know that this wasn't a part of my initial question, but since you mentioned wide string literals, I've got curious about the other literals :).
quinmars
I don't think it would apply, but not sure.
Brian R. Bondy
If you want to initialize a modifyable string with a literal, it's better to use an array for your string. Like this: { char sz[] = "this is a test"; sz[0] = 'T' } In this case, sz is a char array the size of the constant string that is on the stack and initialized to the value of the static constant string (from where ever the compiler puts the string constant pool) by copying the string.
Adisak
+1  A: 

Yes, that's fine. They live in a global string table.

Bill K
+2  A: 

No, string literals do not have scope, so your code is guaranteed to work across all platforms and compilers. They are stored in your program's binary image, so you can always access them. However, trying to write to them (by casting away the const) will lead to undefined behavior.

Adam Rosenfield
A: 

You actually return a pointer to the zero-terminated string stored in the data section of the executable, an area loaded when you load the program. Just avoid to try and change the characters, it might give unpredictable results...

PhiLho
A: 

It's really important to make note of the undefined results that Brian mentioned. Since you have declared the function as returning a const char * type, you should be okay, but on many platforms string literals are placed into a read-only segment in the executable (usually the text segment) and modifying them will cause an access violation on most platforms.

Jason Coco
A: 

This is valid in C (or C++), as others have explained.

The one thing I can think to watch out for is that if you're using dlls, then the pointer will not remain valid if the dll containing this code is unloaded.

The C (or C++) standard doesn't understand or take account of loading and unloading code at runtime, so anything which does that will face implementation-defined consequences: in this case the consequence is that the string literal, which is supposed to have static storage duration, appears from the POV of the calling code not to persist for the full duration of the program.

Steve Jessop