views:

465

answers:

7

Why is there a difference in the output produced when the code is compiled using the two compilers gcc and turbo c.

#include <stdio.h>

int main()
{    
    char *p = "I am a string";
    char *q = "I am a string";

    if(p==q)
    {
        printf("Optimized");
    }
    else{
        printf("Change your compiler");
    }
    return 0;
}

I get "Optimized" on gcc and "Change your compiler" on turbo c. Why?

+14  A: 

Since your string literal is a constant expression, i.e. you should not modify it via a pointer, there is no real purpose in storing it in the memory space twice. Being a newer compiler, gcc merges the literals by default while Turbo C does not. It is a sign of gcc's support for the newer language standard that has the notion of const data.

Amardeep
You can override this behaviour in gcc by passing the `-fno-merge-constants` option, though generally there's no good reason to do so.
Hasturkun
@Hasturkun: Nice tip :) @Amardeep: Very good answer!
Prasoon Saurav
@Amardeep, your answer is not completely correct. A string literal is not a constant expression, otherwise it would not have been possible to assign it to a `char*`. It is true, that one *should* not change it then by accessing through the pointer, but it is allowed. The behavior is just undefined... In any case I don't understand people giving out assignments like that showing such bad habits. This should always be a `char const*` to which such address of a string literal is assigned.
Jens Gustedt
@Jens: Since early C compilers did not have the notion of const, `char *` was all you had to assign it to even though compilers targeting ROM often left the string in Read Only Memory instead of copying it into RAM upon program load. To be portable, it was always safer to treat them as immutable. The newer compilers certainly treat them as immutable otherwise that default merging behavior would be unsafe.
Amardeep
"Since your string literal is a constant expression, i.e. you are not technically allowed to modify it via a pointer". The term "constant expression" can be confused with the formal concept, though. There are constant expressions that you can modify using a pointer. "Constant expression" in C++ and C means that some of the expression's characteristics can be determined at compile time (its value (example: integral and integral constant expression), its referent address (example: address and reference constant expression) and its member offset (example: pointer to member constant expression)).
Johannes Schaub - litb
@Jens formally, a string literal is a "constant expression". Non-formally, in C++ a string literal is actually of type "const char [N]", so is actually an expression with a const-qualified type. A special backwards-compatibility-to-C conversion takes care that conversion to `char*` is possible. That conversion will be gone and such conversions are ill-formed in C++0x. Also, i would not call it "allowed". Doing so is undefined behavior but if everything that yields undefined behavior would be allowed, very little would still be forbidden by C.
Johannes Schaub - litb
In fact now that i think about it, since it is undefined, i would neither say it's allowed nor say it's forbidden. But just leave it at "undefined", because it's up to the implementation to decide (it doesn't even have to decide!).
Johannes Schaub - litb
"Being a newer compiler, gcc is able to detect that while Turbo C does not. It is a sign of the more advanced optimization capability of gcc vs. the older generation Turbo C."-1: Even the oldest versions of Turbo C could in fact merge string constants (command line option -d), it's just disabled by default. Also, merging strings isn't not even close to a "more advanced optimization". In fact, you don't need much more than a hash table for the strings, in order to identify the dupes.
Luther Blissett
@Luther: Thanks, good point. I'm clarifying the statement.
Amardeep
"It is a sign of gcc's support for the newer C99 standard that has the notion of const data." C89 already had the const qualifier for const data, please correct.
Johannes Schaub - litb
My failing recollection... I was thinking of the boolean type that was added in C99. Corrected. Thanks.
Amardeep
+1  A: 

The compiler may keep two copies of identical literals if it thinks proper. Finding out if that is the case is presumably the point of this program.

In the good old days, assemblers kept all literals in a literal pool, and patching the literal pool was a recognised (if not approved) technique of modifying 'constants' throughout the program.

If by some chance the compiler allows in this case *p = 'H'; then important differences in behaviour would result.

Brian Hooper
It should be said that, in many early (pre ANSI) versions of C, modification of literal strings was allowed.
JeremyP
@JeremyP: Define "Allowed". I'm pretty sure it was always undefined behavior (an embedded system could have put that string in ROM) (although technically, pre-ANSI, everytihng was officially "undefined behavior")
James Curran
Compilers for embedded systems usually give their users very fine grained control about where goes what. It's unlikely that string literals would go into the ROM and you couldn't do anything about it.
Luther Blissett
JeremyP
@Luther: you always can do something about it, the correct and portable way. `char mystring[] = "literal goes here";` and then use `mystring` instead of `"literal goes here"`.
R..
Some embedded systems may have 8K of ROM and 256 bytes or less of RAM. I suppose string literals could theoretically be placed in RAM, but that would seem rather dicey.
supercat
+2  A: 

Turbo C was optimized for fast compilation, so it doesn't have any features that would slow it down. Recognizing duplicate strings would be a slow-down, even if only minor.

Mark Ransom
I think this explanation is wrong. Turbo C's defaults are simply there to allow broken code that modifies string constants to work by default.
R..
+26  A: 

Your questions has been tagged C as well as C++. So I'd answer for both the languages.

[C]

From ISO C99 (Section 6.4.5/6)

It is unspecified whether these arrays are distinct provided their elements have the appropriate values.

That means it is unspecified whether p and q are pointing to the same string literal or not. In case of gcc they both are pointing to "I am a string" (gcc optimizes your code) whereas in turbo c they are not.

Unspecified Behavior: Use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance


[C++]

From ISO C++-98 (Section 2.13.4/2)

Whether all string literals are distinct(that is, are stored in non overlapping objects) is implementation defined.

In C++ your code invokes Implementation defined behaviour.

Implementation-defined Behavior: Unspecified Behavior where each implementation documents how the choice is made


Also see this question.

Prasoon Saurav
+1 for having reported subjective-standard-defined meaning of "unspecified"/"implementation defined" behaviour.
ShinTakezou
+1 for very thorough answer!
Amardeep
Thank you @Shin and @Amardeep :)
Prasoon Saurav
+4  A: 

From the gcc manual page :

-fmerge-constants

Attempt to merge identical constants (string constants and floating point constants) across compilation units.

This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.

Enabled at levels -O, -O2, -O3, -Os.

Hence the output.

Praveen S
+5  A: 

Please forget the answers in the same line as

"It's because Turbo C is SO TOTALLY OLD and they couldn't do it THEN, because it had to be FAST, but the GCC is totally NEW and RAD and that's why it does that!".

Both compiler support merging string constants as an option. The GCC option (-fmerge-constants) is turned on at optimization levels, while the Turbo C Option (-d) is turned off on default. If you are using the TCC IDE, then go to Options|Compiler...|Code Generation.. and check "Duplicate strings merged".

Luther Blissett
I found your answer hard to read and initially completely misunderstood it, because the quote wasn't very clearly recognizable as such. I hope you're OK with my formatting changes. Apart from that, good and useful info for anyone still dealing with TC, so: +1.
Carl Smotricz
Oh, that's much better. Thank you!
Luther Blissett
A: 

Historical footnote: Since addresses were smaller than floating-point numeric constants, FORTRAN used to handle floating-point constants much like C handles strings. Since memory was precious, identical constants would be allocated the same space. Also, parameter passing was always done by reference. This meant that if one passed a numeric constant to a procedure that modified its argument, other occurrences of that "constant" would change value.

Hence the old saying: "Variables won't; constants aren't."

Incidentally, has anyone noticed the bug in the Turbo C 2.0 printf which would fail when using a format like "%1.1f" to print numbers like 99.99 (outputs 00.0)? Fixed in 2.01, it reminds me of the Windows 3.1 calculator bug.

supercat