tags:

views:

674

answers:

5

I am learning about pointers and one concept is troubling me. I understand that if you have a pointer (e.g.'pointer1') of type INT that points to an array then you can fill that array with INTS. If you want to address a member of the array you can use the pointer and you can do pointer1 ++; to step through the array. The program knows that it is an array of INTs so it knows to step through in INT size steps. But what if the array is of strings whcih can vary in length. How does it know what to do when you try to increment with ++ as each element is a different length?

Similarly, when you create a vector of strings and use the reserve keyword how does it know how much to reserve if strings can be different lengths? This is probably really obvious but I can't work it out and it doesn't fit in with my current (probably wrong) thinking on pointers. Thanks

+10  A: 

Quite simple.

An array of strings is different from a vector of strings.

An array of strings (C-style pointers) is an array of pointers to an array of characters, "char**". So each element in the array-of-strings is of size "Pointer-to-char-array", so it can step through the elements in the stringarray without a problem. The pointers in the array can point at differently size chunks of memory.

With a vector of strings it is an array of string-objects (C++ style). Each string-object has the same object size, but contains, somewhere, a pointer to a piece of memory where the contents of the string are actually stored. So in this case, the elements in the vector are also identical in size, although different from "just a pointer-to-char-array", allowing simple element-address computation.

hth, h.

haavee
+1: good explanation
David Schmitt
Thank you, I understand now.
Columbo
This correctly explains why all std::string objects are the same size. But I don't agree that "array of strings" means char*[] whereas "vector of strings" means vector<string>. You can have a string[] or a vector<char*>, so when you talk about "strings" you need to make it clear (perhaps from context) what you mean.
Steve Jessop
A: 

An array of strings is an array of pointers to the first character of some strings. The size of a pointer to a char is probably the same size as a pointer to an int.

Essentially, a 2D array isnt necessarily linear in memory, the pointed-to arrays could be anywhere.

jgubby
+6  A: 

This is because a string (at least in C/C++) is not quite the same sort of thing as an integer. If we're talking C-style strings, then an array of them like

char* test[3] = { "foo", "bar", "baz" };

what is actually happening under the hood is that "test" is an array of pointers, each of which point to the actual data where the characters are. Let's say, at random, that the "test" array starts at memory address 0x10000, and that pointers are four bytes long, then we might have

test[0] (memory location 0x10000) contains 0x10020
test[1] (memory location 0x10004) contains 0x10074
test[2] (memory location 0x10008) contains 0x10320

Then we might look at the memory locations around 0x10020, we would find the actual character data:

test[0][0] (memory location 0x10020) contains 'f'
test[0][1] (memory location 0x10021) contains 'o'
test[0][2] (memory location 0x10022) contains 'o'
test[0][3] (memory location 0x10023) contains '\0'

And around memory location 0x10074

test[1][0] (memory location 0x10074) contains 'b'
test[1][1] (memory location 0x10075) contains 'a'
test[1][2] (memory location 0x10076) contains 'r'
test[1][3] (memory location 0x10077) contains '\0'

With C++ std::string objects much the same thing is going on: the actual C++ string object doesn't "contain" the characters because, as you say, the strings are of variable length. What it actually contains is a pointer to the characters. (At least, it does in a simple implementation of std::string - in reality it has a more complicated structure to provide better memory use and performance).

DavidK
Thanks, yours and haavee's answers have really helped
Columbo
A: 

This might seem like pedantry, but in a shoot-yer-foot language like C++ this is important: in your original question you say:

you can do pointer1 ++; to step through the array.

Postincrement (pointer1++) is usually semantically wrong here, because it means "increment pointer1 but keep the expression value at the original value of pointer1". If you have no need for the original value of pointer1, use pre-increment (++pointer1) instead, which has semantically exactly the meaning of "increment the pointer by one".

For some reason most C++ textbooks do the postincrement thing everywhere, teaching new C++-ers bad habits ;-)

MadKeithV
A: 

In C++, arrays and vectors are always containing fixed-size elements. Strings fit this condition, because your string elements are then either pointers to null-terminated c-strings (char *) stored somewhere else, or plain std::string objects.

The std::string object has a constant size, the actual string data is allocated somewhere else (except for small string optimization, but that's another story).

vector<string> a;
a.resize( 2 ); // allocate memory for 2 strings of any length.

vector<char *> b;
b.resize( 2 ); // allocate memory for 2 string pointers.

vector<char> c; // one string. Should use std::string instead.
c.resize( 2 ); // allocate memory for 2 characters (including or not the terminator).

Note that the reserve() function of std::vector just prepare the vector to grow. It's used mainly for optimization purpose. You probably want to use resize().

Jem