views:

386

answers:

5

We try and embed a what string within binary objects so that we can see the version number for an executable or shared library that has been deployed. Typically we embed standard CVS Id information in this what string. For example, we might embed:

const char cvsid[] = "@(#)OUR_TEAM_staging_remap_$Revision: 1.30 $ $Name:  $";

within the C code.

From man (1) what:

The what utility searches each filename for occurrences of the pattern @(#) that the SCCS get command (see sccs-get(1)) substitutes for the @(#) ID keyword, and prints what follows up to a ", >, NEWLINE, \, or NULL character.

There is only one instance of this variable and it is never referred to. Someone suggested that this might get optimised away by the compiler.

I've been using this technique for many years in both C and C++ and with a variety of compilers and I've yet to see a what string optimised away.

Anyone got an idea why they aren't optimised away?

cheers,

Rob

+2  A: 

Typically this doesn't happen because extraneous strings have little cost and can be useful in cases like this, among others (e.g. storing a bunch of string resources with only the first actually referenced in the code).

Cody Brocious
+1 The compiler may not know that you don't reference it in some other fashion. Besides, no harm in adding that to the data section.
sixlettervariables
+2  A: 

They may not be optimized away because your compiler knows that such strings can be used for those purposes.

Of course, the compiler is completely allowed to optimize it away, as long as the program's behavior, more exact the observable behavior, is not changed. That means the sequence of writes and reads to volatiles, and the calls to library functions is not changed.

By optimizing such a string in your app away, i suppose that behavior won't change. But compilers want to be usable and try to not strike in the users way. That's why they contain useful extensions too. If you want to be sure it's not optimized away on occasion though, maybe have a look into the compilers extensions. GCC has a unused attribute, which makes it not emit warnings for unused objects. Maybe that or something similar can help you the variable isn't optimized away.

From a language stand point, there isn's a utility though to force the compiler to keep it.

Edit: There was a usenet post about that topic here, with useful answers.

Johannes Schaub - litb
+1  A: 

Microsoft's Visual C++ 2005 has a linker option which is supposed to control what it does with unused data: /OPT:UNREF forces the linker to keep unused data, /OPT:REF allows it to eliminate it.

However, in my simple test, the option had no effect on the statement

static char VersionString[] = "HELLO_WORLD 2.0";

The string appeared in both the release and debug binaries regardless of the flag.

AShelly
It might depend on the level of optimization you use.
Jonathan Leffler
+1  A: 

Until recently (I spotted the problem in mid-2005), it was possible to use:

static const char sccs[] = "@(#)%W% %E%";

or something similar in source code and GCC and most other compilers would not optimize it away. Starting with the release of GCC from about that time (probably GCC 4.0.x, originating from April 2005), those constant strings were left out of the binaries. So, I had to go around modifying my source code to make the variables externally visible. It is not possible for the compiler to look at the object file alone and conclude that the string is unused because something outside the file might conceivably reference it. So, my files now contain:

#ifndef lint
extern const char jlss_id_filename_c[];
const char jlss_id_filename_c[] = "@(#)$Id$";
#endif /* lint */

OK - that's a hybrid; I really use RCS to store the source code, but I still prefer what to ident for identifying files - plus I have my own hacked what that does both what and ident plus a few tweaks of my own. But I have the declaration in some files - not all - and the definition in all files. (Under some set of warning flags, not now remembered, I was getting warnings when the variable was defined before being declared. It might have been a change in GCC that resolved that problem; I'm not sure any more.)

When I create a new file, my template generator replaces the 'filename_c' with the appropriate name of the file being generated. Similarly for headers - though the identification string is only embedded in one file to avoid multiple definitions.

I preferred the old system with static constants - but this has worked for me for over 3 years.

Jonathan Leffler
If we are unlucky, you'll be able to just this trick for another 3 years until the google gcc folks have progressed with their [WHOPR effort[1], which would make your trick obsolete ;-)[1]:http://gcc.gnu.org/projects/lto/whopr.pdf
none
I guess that at that point, I'll have to find a way to get the arbitrary list of names printed out, or otherwise used. There must be other people who want metadata like this in the executable; GCC should provide a mechanism to mark "this is not used but must still appear in the binary".
Jonathan Leffler
+1  A: 

Without the "static" keyword the variable cannot be optimized away, because another module may declare a reference ot it (using extern). Since C/C++ is usually compiled a file at a time, there is no way the compiler can know if an external reference exists.

By adding the static keyword you tell the compiler that the name is only visible within the compilation using and optimizing it away becomes possible.

I think the linker could detect an unused global variable and optimize it away too if the object format permits it, though I'm not sure anyone does.

robinr