views:

237

answers:

7

I know that the correct way to compare "strings" in C is by using strcmp, but now I tried comparing some character arrays with the == operator, and got some strange results.

Take a look at the following code:

int main()
{
    char *s1 = "Andreas";
    char *s2 = "Andreas";

    char s3[] = "Andreas";
    char s4[] = "Andreas";

    char *s5 = "Hello";

    printf("%d\n", s1 == s2); //1
    printf("%d\n", s3 == s4); //0
    printf("%d\n", s1 == s5); //0
}

The first printf correctly prints a 1, which signals that they are not equal. But can someone explain to me why, when comparing the character arrays, the == is returning a 0 ?

Can someone please explain to me why the first printf is returning a 1 (ie, they are equal) and the character arrays are returning a 0 ?

+16  A: 

The == is comparing the memory address.
It's likely that your compiler is making s1 and s2 point to the same static data to save space.

ie. The "Andreas" in the first two lines of code is stored in your executable data. The compiler has determined that you don't change it and so has optomized the two pointers to point to the same storage.

The char[] lines create a variable by copying the data into the variable and so are stored at different address on the stack during execution.

Martin Beckett
so this is all due to compiler optimization?
Andreas Grech
I'm pretty sure it's a requirement of the standard that identical string literals in the same translation unit ("file", in standards-speak) are **required** to have the same storage location.
Andy Ross
Depending on architecture, those first 2 string constants could be in write protected memory and thus immutable. It would make sense to cut down on space requirements by creating the constant just once.
Carl Smotricz
"optimization" would stretch the word a bit, but yeah.
Carl Smotricz
Andy: Not required, 6.4.5/6 (C99): "It is unspecified whether these arrays are distinct ..."
Roger Pate
+1 Thanks for explaining the discrepancy in the results when comparing character arrays versus comparing the pointers
Andreas Grech
+1 For a very nice way of explanation.
nthrgeek
Is it required by the standard that these strings are read only?
Martin Beckett
+4  A: 

Uh... when == prints a 1, it means they are equal. It's different from strcmp, which returns the relative order of the strings.

Tordek
+1  A: 

Wait a sec... 1 means true, 0 means false. So your explanation is partially backwards. As for why the first two strings seem to be equal: The compiler built that constant string (s1/2) just once.

Carl Smotricz
oh yea, ure actually right; I still had `strcmp` in my head, and I inverted the values!
Andreas Grech
Updated the question
Andreas Grech
A: 

You can't compare strings, but you can compare pointers.

Nosredna
`strcmp()` compares strings
pmg
Yes, I know. :-) I meant with the comparison operators. :-)
Nosredna
+2  A: 

You are comparing addresses and not the strings. The first two are constant and will only be created once.

int main()
{
    char *s1 = "Andreas";
    char *s2 = "Andreas";

    char s3[] = "Andreas";
    char s4[] = "Andreas";

    char *s5 = "Hello";

    printf("%d\n", s1 == s2); //1
    printf("%p == %p\n", s1, s2);
    printf("%d\n", s3 == s4); //0
    printf("%p != %p\n", s3, s4);
    printf("%d\n", s1 == s5); //0
    printf("%p != %p\n", s1, s5);
}

Output on my computer, but you get the idea:

1
0x1fd8 == 0x1fd8
0
0xbffff8fc != 0xbffff8f4
0
0x1fd8 != 0x1fe0
Lucas
+1  A: 

s1 == s2 means "(char*) == (char*)" or that the addresses are the same.

Same thing for s3 == s4. That's the "arrays decay into pointers" at work.

And you have the meaning of the result of the comparison wrong:

0 == 0; /* true; 1 */
42 == 0; /* false; 0 */
"foo" == "bar"; /* false (the addresses are different); 0 */
pmg
+1  A: 

All the values from s1 through s5 aren't char themselves, they're pointers to char. So what you're comparing is the memory addresses of each string, rather than the strings themselves.

If you display the addresses thus, you can see what the comparison operators are actually working on:

#include <stdio.h>

int main() {
  char *s1 = "Andreas";
  char *s2 = "Andreas";

  char s3[] = "Andreas";
  char s4[] = "Andreas";

  char *s5 = "Hello";

  printf("%p\n", s1); // 0x80484d0
  printf("%p\n", s2); // 0x80484d0
  printf("%p\n", s3); // 0xbfef9280
  printf("%p\n", s4); // 0xbfef9278
  printf("%p\n", s5); // 0x80484d8
}

Exactly where the strings are allocated in memory is implementation specific. In this case, the s1 and s2 are pointing to the same static memory block, but I wouldn't expect that behaviour to be portable.

goldPseudo
Yeah that's a string pool optimization. I wouldn't count on it.
Nosredna