tags:

views:

330

answers:

7
+11  Q: 

C strings confusion

Hello!

I'm learning C right now and got a bit confused with character arrays - strings.

char name[15]="Fortran";

No problem with this - its an array that can hold (up to?) 15 chars

char name[]="Fortran";

C counts the number of characters for me so I don't have to - neat!

char* name;

Okay. What now? All I know is that this can hold an big number of characters that are assigned later (e.g.: via user input), but

  • Why do they call this a char pointer? I know of pointers as references to variables
  • Is this an "excuse"? Does this find any other use than in char*?
  • What is this actually? Is it a pointer? How do you use it correctly?

thanks in advance, lamas

+1  A: 

One is an actual array object and the other is a reference or pointer to such an array object.

The thing that can be confusing is that both have the address of the first character in them, but only because one address is the first character and the other address is a word in memory that contains the address of the character.

The difference can be seen in the value of &name. In the first two cases it is the same value as just name, but in the third case it is a different type called pointer to pointer to char, or **char, and it is the address of the pointer itself. That is, it is a double-indirect pointer.

#include <stdio.h>

char name1[] = "fortran";
char *name2 = "fortran";

int main(void) {
    printf("%lx\n%lx %s\n", (long)name1, (long)&name1, name1);
    printf("%lx\n%lx %s\n", (long)name2, (long)&name2, name2);
    return 0;
}
Ross-Harveys-MacBook-Pro:so ross$ ./a.out
100001068
100001068 fortran
100000f58
100001070 fortran
DigitalRoss
+1  A: 

char* name is just a pointer. Somewhere along the line memory has to be allocated and the address of that memory stored in name.

  • It could point to a single byte of memory and be a "true" pointer to a single char.
  • It could point to a contiguous area of memory which holds a number of characters.
  • If those characters happen to end with a null terminator, low and behold you have a pointer to a string.
Robert
+1  A: 

In C a string is actually just an array of characters, as you can see by the definition. However, superficially, any array is just a pointer to its first element, see below for the subtle intricacies. There is no range checking in C, the range you supply in the variable declaration has only meaning for the memory allocation for the variable.

a[x] is the same as *(a + x), i.e. dereference of the pointer a incremented by x.

if you used the following:

char foo[] = "foobar";
char bar = *foo;

bar will be set to 'f'

To stave of confusion and avoid misleading people, some extra words on the more intricate difference between pointers and arrays, thanks avakar:

In some cases a pointer is actually semantically different from an array, a (non-exhaustive) list of examples:

//sizeof
sizeof(char*) != sizeof(char[10])

//lvalues
char foo[] = "foobar";
char bar[] = "baz";
char* p;
foo = bar; // compile error, array is not an lvalue
p = bar; //just fine p now points to the array contents of bar

// multidimensional arrays
int baz[2][2];
int* q = baz; //compile error, multidimensional arrays can not decay into pointer
int* r = baz[0]; //just fine, r now points to the first element of the first "row" of baz
int x = baz[1][1];
int y = r[1][1]; //compile error, don't know dimensions of array, so subscripting is not possible
int z = r[1]: //just fine, z now holds the second element of the first "row" of baz

And finally a fun bit of trivia; since a[x] is equivalent to *(a + x) you can actually use e.g. '3[a]' to access the fourth element of array a. I.e. the following is perfectly legal code, and will print 'b' the fourth character of string foo.

#include <stdio.h>

int main(int argc, char** argv) {
  char foo[] = "foobar";

  printf("%c\n", 3[foo]);

  return 0;
}
wich
Arrays are *not* pointers. They merely *decay* to pointers under some circumstances.
avakar
@avakar strictly speaking you are right, but for simple understanding it makes things easier to think of it like that.
wich
dereference of the pointer a incremented by x TIMES SIZEOF (content type) in bytes.
Alex Brown
@Alex well yes, naturally, but that's just basic pointer arithmetic, not something particular to arrays.
wich
+1  A: 

char *name, on it's own, can't hold any characters. This is important.

char *name just declares that name is a pointer (that is, a variable whose value is an address) that will be used to store the address of one or more characters at some point later in the program. It does not, however, allocate any space in memory to actually hold those characters, nor does it guarantee that name even contains a valid address. In the same way, if you have a declaration like int number there is no way to know what the value of number is until you explicitly set it.

Just like after declaring the value of an integer, you might later set its value (number = 42), after declaring a pointer to char, you might later set its value to be a valid memory address that contains a character -- or sequence of characters -- that you are interested in.

Stephen Canon
+1  A: 

That is a pointer. Which means it is a variable that holds an address in memory. It "points" to another variable.

It actually cannot - by itself - hold large amounts of characters. By itself, it can hold only one address in memory. If you assign characters to it at creation it will allocate space for those characters, and then point to that address. You can do it like this:

char* name = "Mr. Anderson";

That is actually pretty much the same as this:

char name[] = "Mr. Anderson";

The place where character pointers come in handy is dynamic memory. You can assign a string of any length to a char pointer at any time in the program by doing something like this:

char *name;
name = malloc(256*sizeof(char));
strcpy(name, "This is less than 256 characters, so this is fine.");

Alternately, you can assign to it using the strdup() function, like this:

char *name;
name = strdup("This can be as long or short as I want.  The function will allocate enough space for the string and assign return a pointer to it.  Which then gets assigned to name");

If you use a character pointer this way - and assign memory to it, you have to free the memory contained in name before reassigning it. Like this:

if(name)
    free(name);
name = 0;

Make sure to check that name is, in fact, a valid point before trying to free its memory. That's what the if statement does.

The reason you see character pointers get used a whole lot in C is because they allow you to reassign the string with a string of a different size. Static character arrays don't do that. They're also easier to pass around.

Also, character pointers are handy because they can be used to point to different statically allocated character arrays. Like this:

char *name;

char joe[] = "joe";
char bob[] = "bob";

name = joe;

printf("%s", name);

name = bob;
printf("%s", name);

This is what often happens when you pass a statically allocated array to a function taking a character pointer. For instance:

void strcpy(char *str1, char *str2);

If you then pass that:

char buffer[256];
strcpy(buffer, "This is a string, less than 256 characters.");

It will manipulate both of those through str1 and str2 which are just pointers that point to where buffer and the string literal are stored in memory.

Something to keep in mind when working in a function. If you have a function that returns a character pointer, don't return a pointer to a static character array allocated in the function. It will go out of scope and you'll have issues. Repeat, don't do this:

char *myFunc() {
    char myBuf[64];
    strcpy(myBuf, "hi");
    return myBuf;
}

That won't work. You have to use a pointer and allocate memory (like shown earlier) in that case. The memory allocated will persist then, even when you pass out of the functions scope. Just don't forget to free it as previously mentioned.

This ended up a bit more encyclopedic than I'd intended, hope its helpful.

Editted to remove C++ code. I mix the two so often, I sometimes forget.

Daniel Bingham
What are those `new` and `delete` things doing in my beautiful C language?
Chris Lutz
Ahh, right, c... *edits*
Daniel Bingham
There is no need to check if pointer is null before calling free() function. The C standard guarantees that free() called against null pointer does nothing, has no effect. It's perfectly well-formed code: int* p = NULL; free(p);
mloskot
If it does, then that's new. Very new. Every c compiler I've ever used choked pretty seriously if it ever hit a null pointer on a free. But then, I haven't written much C in the past year or two, so it's entirely possible things could have changed.
Daniel Bingham
No, free(NULL) has always been safe. You're probably thinking of double free situations.
Dan Olson
Well, then the if(pointer) catches something other than null. Because I can assure you, for older versions of gcc. If you pass a corrupt pointer to free, it will blow up. If you place this if statement ahead of it, it won't get passed to free.
Daniel Bingham
+1  A: 

It is confusing indeed. The important thing to understand and distinguish is that char name[] declares array and char* name declares pointer. The two are different animals.

However, array in C can be implicitly converted to pointer to its first element. This gives you ability to perform pointer arithmetic and iterate through array elements (it does not matter elements of what type, char or not). As @which mentioned, you can use both, indexing operator or pointer arithmetic to access array elements. In fact, indexing operator is just a syntactic sugar (another representation of the same expression) for pointer arithmetic.

It is important to distinguish difference between array and pointer to first element of array. It is possible to query size of array declared as char name[15] using sizeof operator:

char name[15] = { 0 };
size_t s = sizeof(name);
assert(s == 15);

but if you apply sizeof to char* name you will get size of pointer on your platform (i.e. 4 bytes):

char* name = 0;
size_t s = sizeof(name);
assert(s == 4); // assuming pointer is 4-bytes long on your compiler/machine

Also, the two forms of definitions of arrays of char elements are equivalent:

char letters1[5] = { 'a', 'b', 'c', 'd', '\0' };
char letters2[5] = "abcd"; /* 5th element implicitly gets value of 0 */

The dual nature of arrays, the implicit conversion of array to pointer to its first element, in C (and also C++) language, pointer can be used as iterator to walk through array elements:

/ *skip to 'd' letter */
char* it = letters1;
for (int i = 0; i < 3; i++)
    it++;
mloskot
+18  A: 
tommieb75
Wow! Thanks for this wonderful, detailed answer! :O
lamas
Uninitialized does not mean initialized to NULL.
Dan Olson
@Dan: Oh...ok...I'll fix this up...thanks for pointing it out...
tommieb75
@Dan: Better? ;)
tommieb75
@Dan Olson - Actually yes it does, in C99. If you fail to initialize something, it will be initialized by the compiler to (yep, you guessed it, platform independent), but at least its consistent! In most cases (ones that people actually use), this is zero.
Tim Post