tags:

views:

139

answers:

3

In here it's said that for global variable the following form:

(1) const char *a = "...";

is less good than:

(2) const char a[] = "..."

Why? I always thought that (1) is better, since (2) actually replicate the string we assign it, while (1) only points to it.

+1  A: 

In (2) the string exists only in one version, and you are manipulating it directly by its address.

In (1) there is the string somewhere in memory, and then you put this address in another location in memory and force yourself to read the address from the other location each time you need it. In effect, it adds a (useless) indirection.

EDIT:

As I say in the comments below and for another answer, there is no duplication in (2).

t.c:

char  *p = "This is a test";
char s[] = "This is a test";

The command gcc -S t.c produces the file:

.globl _p
    .cstring
LC0:
    .ascii "This is a test\0"
    .data
    .align 2
_p:
    .long LC0
.globl _s
_s:
    .ascii "This is a test\0"
    .subsections_via_symbols

ts.c:

char s[] = "This is a test";

The command gcc -S ts.c now produces the file:

.globl _s
    .data
_s:
    .ascii "This is a test\0"
    .subsections_via_symbols
Pascal Cuoq
Nope. In 1) `a` is a pointer and its contents is a pointer to a string literal; in 2) `a` is an array and its contents are the characters that make up the string literal **copied** to the memory created for the array
pmg
@pgm However, in 2) and since it's a global we're talking about here, the copy takes place at compile-time. "After" the copy, the string literal, now useless, can be omitted from the binary. You might as well think of the `char a[]="...";` as a convenient syntax for providing the initialization of a global char array.
Pascal Cuoq
Strictly, not "since it's a global", rather, "since it has static storage duration".
caf
A: 

They're different things, and not inter-changeable.

Use 1) when you need a pointer; use 2) when you need an array; use 3) when you need a resizeable array

3)

#define LITERAL "..."
char *a = malloc(strlen(LITERAL) + 1);
if (!a) /* no memory; */
strcpy(a, LITERAL);
pmg
+3  A: 

The argument at the livejournal link is that (1) introduces an unneccesary level of indirection by creating a separate pointer variable and a security hole in that the pointer variable may be overwritten. Assume the following two declarations:

char  *p = "This is a test";
char s[] = "This is a test";

Assume these declarations are at file scope, and thus both p and s have static extent.

Here's a hypothetical memory map showing how everything is laid out:

 
                    0x00  0x01  0x02  0x03
        0x00008000: 'T'   'h'   'i'   's'
        0x00008004: ' '   'i'   's'   ' '
        0x00008008: 'a'   ' '   't'   'e'
        0x0000800C: 's'   't'    0    ...
        ...
     p: 0x00010000: 0x00  0x00  0x80  0x00
     s: 0x00010004: 'T'   'h'   'i'   's'
        0x00010008: ' '   'i'   's'   ' '
        0x0001000C: 'a'   ' '   't'   'e'
        0x00010010: 's'   't'    0    ...

The arguments presented at the link are as follows:

  1. An additional variable -- p is a distinct object from the string to which it refers; it doesn't contain a string value on its own, whereas s does;
  2. More attack points, the variable is writable -- it's possible to reassign p to point somewhere else (perhaps to a segment containing malicious code), whereas you cannot reassign s.
  3. An additional relocation -- not sure what this is referring to (for the kind of work I do I've never really had to worry about performance at the machine level, so I'm not familiar with all the terminology);
  4. Getting the string address requires a memory load and accessing the string itself requires two memory loads -- if you're reading the string through p, first you have to load the contents of 0x00010000 to get the string address (0x00008000), then you have to load the contents of 0x00008000 to get the string value itself. If you're doing that a lot, then using a char array and cutting out one level of indirection may result in a noticable performance boost.

In summary, you trade a little memory for improved speed and security. Of course, this assumes a particular operating environment, and may not apply universally.

John Bode
I have to insist, as I did in a reply to pmg's comment, that is you define only `s`, then it becomes unnecessary to include the literal string in the object file, so you are *not* trading memory for faster (and safer access), you are saving memory *and* getting faster, safer access. Actually the link provided by the OP advocated `s`-style definitions for the lower memory use. I will update my answer with proof.
Pascal Cuoq
This said, nice explanation apart from that concluding "trade-off" remark.
Pascal Cuoq
The "additional relocation" in the context of writing dynamic libraries - when the library is loaded, all references to static objects must be updated based on the load address of the library. The first case has two static objects (`a` and the unnamed character array), whereas the second case only has one.
caf