ansaurus

Question

Answer 1

A:

The string is simply a character array which can be written as:

char name[] = "ben";

ennuikiller 2010-07-01 02:25:10

Hmm, no. `char name[] = "ben";` and `char* name = "ben";` are quite different. Try `name = "john";` as the next statement on both.

Nikolai N Fetissov 2010-07-01 02:31:21

@Nikolai It's such a minor distinction though, and for some reason it turns into a big deal any time questions about pointers or arrays come up; you can pretend pointers and arrays are exactly the same and in 99% of situations not have a problem

Michael Mrozek 2010-07-01 02:34:55

@Michael: It is *not* a minor distinction. This is a special case. In the first case, modifying the string pointed to by `name` is undefined behavior, because it's pointing to an area that can't be "legally" modified. In the second, the array is initialized with the string, it can be modified "legally", and the size of the string determines the size of the array. Taking sizeof(name) in the first place will give you the size of the pointer, in the second the size of the char array.

Tim Schaeffer 2010-07-01 02:43:18

@Tim Well, the first is a special case with constant strings, I thought Nikolai was talking about array/pointer differences in general. And sure, `sizeof` is one of the cases they differ, but it's one of the few; it seems like people make a big deal about the difference when they're largely interchangeable except for a few cases like that

Michael Mrozek 2010-07-01 02:49:13

@Michael, it is a big deal actually. It's this 1% of cases that'll keep you debugging the freaking thing past midnight.

Nikolai N Fetissov 2010-07-01 02:50:51

Answer 2

+8 A:

Because arrays are automatically decayed to pointers. It's one-way conversion though.

What happens in this particular case is that the anonymous array "ben" is placed by compiler into probably read-only data section of the executable (usually .rodata in ELF), and then at runtime the variable name is assigned the address of the first byte in that array.

Nikolai N Fetissov 2010-07-01 02:25:32

... _probably_ read-only data section. There's no requirement for this.

paxdiablo 2010-07-01 02:52:25

If you don't want it modified, declare it const and don't case that away. Otherwise, there's no guarantee of constness. Oh, and make sure it's a pointer to a const char, not a const pointer to a mutable char. :-)

Steven Sudit 2010-07-01 03:21:51

@paxdiable, right, thanks.

Nikolai N Fetissov 2010-07-01 04:10:26

Answer 3

+4 A:

Is this 'hidden' pointer arithmetic?

No. It's explicit, in-your-face pointer arithmetic. That's what * means. Pointer.

S.Lott 2010-07-01 02:25:56

Not all uses of pointers constitute pointer arithmetic.

Tyler McHenry 2010-07-01 03:30:41

Answer 4

A:

name is just a pointer to the first memory address, in this case 'b' (or 'h' in the case of the post's title). A null character is inserted at the end to denote the end of the string. So not really pointer arithmetic.

dave 2010-07-01 02:27:49

Answer 5

+4 A:

Strings in C are just adjacent bytes located in memory ending with a n implicit '\0' byte. By writing char* p = "string" you just load an address of the first byte in this sequence into p.

Now for your exact question, the code you provided will allocate this "ben" string as four bytes 'b', 'e', 'n' and '\0' in program's static memory. This means that the string will not be dynamically allocated in heap or automatically on the stack. It will be stored in a static section in your compiled and linked program image. The pointer variable 'name' however will be an automatically allocated stack variable that will hold an address of the first byte of the string.

Inso Reiges 2010-07-01 02:32:14

Not sure that implicit is the right word? I would have used "added" or something else. "Implicit" implies "implied" :-) As in "not actually there in reality", like the decimal point in a COBOL `999V99` picture (showing my age there). Whereas the NUL character _is_ really there at the end of the byte array.

paxdiablo 2010-07-05 07:49:09

Answer 6

+5 A:

There isn't any hidden pointer arithmetic, but I suspect you want a more detailed answer than that.

If you have a function:

void foo() {
    char * bar = "Hello World";
}

There are actually two chunks of memory that come in to play:

The first is where the 12 bytes are used to store "Hello World" (1 byte for each letter plus a NULL byte at the end). The compiler will put this in the Data segment. This memory (location and values) is set at compile time and cannot be modified at run time (if you try it will segfault).
The second location is the pointer to the data, this is the bar variable. When your program calls foo(), it allocates enough stack space (4 bytes on 32 bit) to house this memory location and it gets initialized to the location of the actual data. This happens every time you fun foo().

Further more, if you execute a statement like this later in the function:

bar = "Good bye";

You aren't changing the data "Hello World" to "Good bye". You actually just end up with a 3rd chunk of memory in the data segment with "Good bye" in it (still allocated at compile time), then the pointer (bar) gets set to that location when that line executes.

Another method to create "strings" (character arrays) is:

void foo() {
    char bar[] = "Hello World";
}

This is not the same as the first (close, though). In this method, you still have two variables, except the actual data you're concerned about ("Hello World" + null byte) is allocated and initialized on the program stack.

You can see the difference in the compiled assembly by running gcc -S test.c and then reading test.s.

At some point you will want to look at C's string functions.

They key thing to remember when using these functions is that they don't know how long your character arrays are at all, they figure that out based on where the first null character is (a sentinel value).

James Harr 2010-07-01 03:19:47

Answer 7

A:

When you declare a string like this, it's a regular variable initialized to point to an address in the string table (read-only portion of the executable). The difference between this and an array of characters is that the array is declared off the stack, and thus is writeable. You shouldn't attempt to modify constant strings like this, which is why they should be declared 'const', so the compiler will protect you from yourself. Better to catch it at compile time than wonder why you got a seg fault.

Shawn D. 2010-07-01 03:27:50

ansaurus

tags:

views:

answers:

How does char *blah = "hello" work?

related questions