views:

79

answers:

4

Ok, i've always kind of known that computers treat strings as a series of numbers under the covers, but i never really looked at the details of how it works. What sort of magic is going on in the average compiler/processor when we do, for instance, the following?

string myString = "foo";
myString += "bar";
print(myString) //replace with printing function of your choice
A: 

Its very language dependent. However, in most cases strings are immutable, so doing that is going to allocate a new string and release the old one's memory.

David Pfeffer
A: 

I'm assuming a typo in your sample and that there is only one variable called either foo or myString, not two variables?

I'd say that it'll depend a lot on what compiler you're using. In .Net strings are immutable so when you add "bar" you're not actually adding it but rather creating a new string containing "foobar" and telling it to put that in your variable.
In other languages it will work differently.

ho1
You're correct, thanks for the tip!
RCIX
+1  A: 

The answer is completely dependent on the language in question. But C is usually a good language to kind of see how things happen behind the scenes.

In C:

In C strings are array of char with a 0 at the end:

char str[1024];
strcpy(str, "hello ");
strcpy(str, "world!");

Behind the scenes str[0] == 'h' (which has an int value), str[1] == 'e', ... str[11] == '!', str[12] == '\0';

A char is simply a number which can contain one of 256 values. Each character has a numeric value.

In C++:

strings are supported in the same way as C but you also have a string type which is part of STL.

string literals are part of static storage and cannot be changed directly unless you want undefined behavior.

It's implementation dependent how the string type actually works behind the scenes, but the string objects themselves are mutable.

In C#:

strings are immutable. Which means you can't directly change a string once it's created. When you do += what happen is a new string gets created and your string now references that new string.

Brian R. Bondy
+1  A: 

The implementation varies between language and compiler of course, but typically for C it's something like the following. Note that strings are essentially syntactical sugar for char arrays (char[]) in C.

1.

string myString = "foo";
  • Allocate 3 bytes of memory for the array and set the value of the 1st byte to 'f' (its ASCII code rather), the 2nd byte to 'o', the 2rd byte to 'o'.

2.

foo += "bar";
  • Read existing string (char array) from memory pointed to by foo.

  • Allocate 6 bytes of memory, fill the first 3 bytes with the read contents of foo, and the next 3 bytes with b, a, and r.

3.

print(foo)
  • Read the string foo now points to from memory, and print it to the screen.

This is a pretty rough overview, but hopefully should give you the general idea.

Side note: In some languages/compuilers, char != byte - for example, C#, where strings are stored in Unicode format by default, and notably the length of the string is also stored in memory. C++ typically uses null-terminated strings, which solves the problem in another way, though it means determining its length is O(n) rather than O(1).

Noldorin