ansaurus

Question

Answer 1

+1 A:

In (2) the string exists only in one version, and you are manipulating it directly by its address.

In (1) there is the string somewhere in memory, and then you put this address in another location in memory and force yourself to read the address from the other location each time you need it. In effect, it adds a (useless) indirection.

EDIT:

As I say in the comments below and for another answer, there is no duplication in (2).

t.c:

char  *p = "This is a test";
char s[] = "This is a test";

The command gcc -S t.c produces the file:

.globl _p
    .cstring
LC0:
    .ascii "This is a test\0"
    .data
    .align 2
_p:
    .long LC0
.globl _s
_s:
    .ascii "This is a test\0"
    .subsections_via_symbols

ts.c:

char s[] = "This is a test";

The command gcc -S ts.c now produces the file:

.globl _s
    .data
_s:
    .ascii "This is a test\0"
    .subsections_via_symbols

Pascal Cuoq 2009-11-20 12:07:38

Nope. In 1) `a` is a pointer and its contents is a pointer to a string literal; in 2) `a` is an array and its contents are the characters that make up the string literal **copied** to the memory created for the array

pmg 2009-11-20 12:12:49

@pgm However, in 2) and since it's a global we're talking about here, the copy takes place at compile-time. "After" the copy, the string literal, now useless, can be omitted from the binary. You might as well think of the `char a[]="...";` as a convenient syntax for providing the initialization of a global char array.

Pascal Cuoq 2009-11-20 14:11:30

Strictly, not "since it's a global", rather, "since it has static storage duration".

caf 2009-11-21 02:47:46

Answer 2

A:

They're different things, and not inter-changeable.

Use 1) when you need a pointer; use 2) when you need an array; use 3) when you need a resizeable array

3)

#define LITERAL "..."
char *a = malloc(strlen(LITERAL) + 1);
if (!a) /* no memory; */
strcpy(a, LITERAL);

pmg 2009-11-20 12:15:18

Answer 3

+3 A:

The argument at the livejournal link is that (1) introduces an unneccesary level of indirection by creating a separate pointer variable and a security hole in that the pointer variable may be overwritten. Assume the following two declarations:

char  *p = "This is a test";
char s[] = "This is a test";

Assume these declarations are at file scope, and thus both p and s have static extent.

Here's a hypothetical memory map showing how everything is laid out:

 
                    0x00  0x01  0x02  0x03
        0x00008000: 'T'   'h'   'i'   's'
        0x00008004: ' '   'i'   's'   ' '
        0x00008008: 'a'   ' '   't'   'e'
        0x0000800C: 's'   't'    0    ...
        ...
     p: 0x00010000: 0x00  0x00  0x80  0x00
     s: 0x00010004: 'T'   'h'   'i'   's'
        0x00010008: ' '   'i'   's'   ' '
        0x0001000C: 'a'   ' '   't'   'e'
        0x00010010: 's'   't'    0    ...

The arguments presented at the link are as follows:

An additional variable -- p is a distinct object from the string to which it refers; it doesn't contain a string value on its own, whereas s does;
More attack points, the variable is writable -- it's possible to reassign p to point somewhere else (perhaps to a segment containing malicious code), whereas you cannot reassign s.
An additional relocation -- not sure what this is referring to (for the kind of work I do I've never really had to worry about performance at the machine level, so I'm not familiar with all the terminology);
Getting the string address requires a memory load and accessing the string itself requires two memory loads -- if you're reading the string through p, first you have to load the contents of 0x00010000 to get the string address (0x00008000), then you have to load the contents of 0x00008000 to get the string value itself. If you're doing that a lot, then using a char array and cutting out one level of indirection may result in a noticable performance boost.

In summary, you trade a little memory for improved speed and security. Of course, this assumes a particular operating environment, and may not apply universally.

John Bode 2009-11-20 15:08:06

I have to insist, as I did in a reply to pmg's comment, that is you define only `s`, then it becomes unnecessary to include the literal string in the object file, so you are *not* trading memory for faster (and safer access), you are saving memory *and* getting faster, safer access. Actually the link provided by the OP advocated `s`-style definitions for the lower memory use. I will update my answer with proof.

Pascal Cuoq 2009-11-20 15:22:14

This said, nice explanation apart from that concluding "trade-off" remark.

Pascal Cuoq 2009-11-20 15:23:01

The "additional relocation" in the context of writing dynamic libraries - when the library is loaded, all references to static objects must be updated based on the load address of the library. The first case has two static objects (`a` and the unnamed character array), whereas the second case only has one.

caf 2009-11-21 02:54:17

ansaurus

tags:

views:

answers:

Defining pointer to static string

related questions