views:

1361

answers:

8

In C, I can do like this:

char s[]="hello"; or char *s ="hello";

so i wonder what is the difference? I want to know what actually happen in memory allocation during compile time and run time.

+24  A: 

The difference here is that

char *s = "Hello world";

will place Hello world in the read-only parts of the memory and making s a pointer to that, making any writing operation on this memory illegal. While doing:

char s[] = "Hello world";

puts the literal string in read-only memory and copies the string to newly allocated memory on the stack. Making

s[0] = 'J';

legal.

Rickard
Mmmmm Jell-O. Yum.
Carl Norum
+1 I like Jell-O too.
Aaron
The literal string `"Hello world"` is in "read-only parts of the memory" in both examples. The example with the array **points** there, the example with the array **copies** the characters to the array elements.
pmg
pmg: In the second case the literal string does not necessarily exist in memory as a single contiguous object at all - it's just an initialiser, the compiler could quite resonably emit a series of "load immediate byte" instructions that contain the character values embedded within them.
caf
@pmg: to be fair it depends on the context. For a global variable, the compiler can just put `"Hello world"` directly in a writeable section loadable on startup, for an automatic variable the array does need to be reinitialized every time.
Charles Bailey
The char array example does *not* necessarily place the string on the stack - if it appears at file level, it will probably be in some kind of initialised data segment instead.
caf
@caf, @Charles: The Standard (n1401.pdf) says @ 6.4.5 String literals /5 `"... The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. ..."` It doesn't "speak" of 'plain' character sequences, but I think the same applies. I also think an implementation can ignore this particular bit of the Standard for performance reasons :)
pmg
I'd like to point out that char s = "xx" doesn't *have* to be in read-only memory (some implementations have no MMUs, for example). The n1362 c1x draft simply states that modifying such an array causes undefined behavior. But +1 anyway, since relying on that behavior is a silly thing to do.
paxdiablo
+6  A: 

.

char s[] = "hello";

declares s to be a array of char which is long enough to hold the initializer (5 + 1 chars) and initializes the array by copying the members of the given string literal into the array.

char *s = "hello";

declares s to be a pointer to one or more (in this case more) chars and points it directly at a fixed (read-only) location containing the literal "hello".

Charles Bailey
+4  A: 

This declaration:

char s[] = "hello";

Creates one object - a char array of size 6, called s, initialised with the values 'h', 'e', 'l', 'l', 'o', '\0'. Where this array is allocated in memory, and how long it lives for, depends on where the declaration appears. If the declaration is within a function, it will live until the end of the block that it is declared in, and almost certainly be allocated on the stack; if it's outside a function, it will probably be stored within an "initialised data segment" that is loaded from the executable file into writeable memory when the program is run.

On the other hand, this declaration:

char *s ="hello";

Creates two objects:

  • a read-only array of 6 chars containing the values 'h', 'e', 'l', 'l', 'o', '\0', which has no name and has static storage duration (meaning that it lives for the entire life of the program); and
  • a variable of type pointer-to-char, called s, which is initialised with the location of the first character in that unnamed, read-only array.

The unnamed read-only array is typically located in the "text" segment of the program, which means it is loaded from disk into read-only memory, along with the code itself. The location of the s pointer variable in memory depends on where the declaration appears (just like in the first example).

caf
+8  A: 

First off, in function arguments, they are exactly equivalent:

void foo(char *x);
void foo(char x[]); // exactly the same in all respects (note! this only applies if the brackets are empty)

In other contexts, char * allocates a pointer, while char [] allocates an array. Where does the string go in the former case, you ask? The compiler secretly allocates a static anonymous array to hold the string literal. So:

char *x = "Foo";
// is approximately equivalent to:
static const char __secret_anonymous_array[] = "Foo";
char *x = (char *) __secret_anonymous_array;

Note that you must not ever attempt to modify the contents of this anonymous array via this pointer; the effects are undefined (often meaning a crash):

x[1] = 'O'; // BAD. DON'T DO THIS.

Using the array syntax directly allocates it into new memory. Thus modification is safe:

char x[] = "Foo";
x[1] = 'O'; // No problem.

However the array only lives as long as its contaning scope, so if you do this in a function, don't return or leak a pointer to this array - make a copy instead with strdup() or similar. If the array is allocated in global scope, of course, no problem.

bdonlan
I think you meant x[1] = 'O'; (single quotes, not doubles).
paxdiablo
Indeed I did, fixed.
bdonlan
A: 

char *s = "Hello world"; ->Here, "s" is an array of characters, which can be overwritten if we wish.

char *s = "hello"; ->A string literal is used to create these character blocks somewhere in the memory for which this pointer "s" is pointing to. We can here reassign the pointer for which it is pointing by changing that, but as long as it points to string literal the block of characters to which it points cant be changed.

Sailaja
A: 

In the case of:

char *x = "fred";

x is an lvalue -- it can be assigned to. But in the case of:

char x[] = "fred";

x is not an lvalue, it is an rvalue -- you cannot assign to it.

Lee-Man
Technically, `x` is a non-modifiable lvalue. In almost all contexts though, it will evaluate to a pointer to its first element, and *that* value is an rvalue.
caf
+1  A: 

Given the declarations

char *s0 = "hello world";
char s1[] = "hello world";

assume the following hypothetical memory map:

                    0x01  0x02  0x03  0x04
        0x00008000: 'h'   'e'   'l'   'l'
        0x00008004: 'o'   ' '   'w'   'o'
        0x00008008: 'r'   'l'   'd'   0x00
        ...
s0:     0x00010000: 0x00  0x00  0x80  0x00
s1:     0x00010004: 'h'   'e'   'l'   'l'
        0x00010008: 'o'   ' '   'w'   'o'
        0x0001000C: 'r'   'l'   'd'   0x00

The string literal "hello world" is a 12-element array of char (const char in C++) with static extent, meaning that the memory for it is allocated when the program starts up and remains allocated until the memory terminates. Attempting to modify the contents of a string literal invokes undefined behavior.

The line

char *s0 = "hello world";

defines s0 as a pointer to char with auto extent (meaning the variable s0 only exists for the scope in which it is declared) and copies the address of the string literal (0x00008000 in this example) to it. Note that since s0 points to a string literal, it should not be used as an argument to any function that would try to modify it (e.g., strtok(), strcat(), strcpy(), etc.).

The line

char s1[] = "hello world";

defines s1 as a 12-element array of char (length is taken from the string literal) with auto extent and copies the contents of the literal to the array. As you can see from the memory map, we have two copies of the string "hello world"; the difference is that you can modify the string contained in s1.

s0 and s1 are interchangeable in most contexts; here are the exceptions:

sizeof s0 == sizeof (char*)
sizeof s1 == 12

type of &s0 == char **
type of &s1 == char (*)[12] // pointer to a 12-element array of char

You can reassign the variable s0 to point to a different string literal or to another variable. You cannot reassign the variable s1 to point to a different array.

John Bode
A: 

In the light of comments here it should be obvious that : char * s = "hello" ; Is a bad idea, and should be used in very narrow scope.

This might be a good opportunity to point out that "const correctness" is a "good thing". Whenever and wherever You can, use the "const" keyword to protect your code, from "relaxed" callers or programmers, which are usually most "relaxed" when pointers come into play.

Enough melodrama, here is what one can achieve when adorning pointers with "const". (Note: One has to read pointer declarations right-to-left.) Here are the 3 different ways to protect yourself when playing with pointers :

const DBJ* p means "p points to a DBJ that is const"

— that is, the DBJ object can't be changed via p.

DBJ* const p means "p is a const pointer to a DBJ"

— that is, you can change the DBJ object via p, but you can't change the pointer p itself.

const DBJ* const p means "p is a const pointer to a const DBJ"

— that is, you can't change the pointer p itself, nor can you change the DBJ object via p.

The errors related to attempted const-ant mutations are caught at compile time. There is no runtime space or speed penalty for const.

(Assumption is you are using C++ compiler, of course ?)

--DBJ

DBJDBJ