views:

207

answers:

5

I see the follow pattern occurring quite frequently:

 b->last = ngx_cpymem(b->last, "</pre><hr>", sizeof("</pre><hr>") - 1);

Notice that the literal string is used twice. The extract is from the nginx source-base.

The compiler should be able to merge these literals when it is encountered within the compilation unit.

My questions are:

  1. Do the commercial-grade compilers(VC++, GCC, LLVM/Clang) remove this redundancy when encountered within a compilation unit ?
  2. Does the (static) linker remove such redundancies when linking object files.
  3. if 2 applies would this optimization occur during dynamic linking ?
  4. If 1 and 2 apply, do they apply to all literals.

These questions are important because it allows a programmer to be verbose without losing efficiency -- i.e., think about enormous static data models being hard-wired into a program (for example the rules of a Decision Support System used in some low-level scenario).

Edit

2 points / clarifications

  1. The code above is written by a recognised "master" programmer. The guy single handedly wrote nginx.

  2. I have not asked which of the possible mechanisms of literal hard-coding is better. Therefore don't go off-topic.

Edit 2

My original example was quite contrived and restrictive. The following snippet shows the usage of string literals being embedded into internal hard-coded knowledge. The first snippet is meant for the config parser telling it what enum values to set for which string, and the second to be used more generally as a string in the program. Personally I am happy with this as long as the compiler uses one copy of the string literal, and since the elements are static, they don't enter the global symbol tables.

static ngx_conf_bitmask_t  ngx_http_gzip_proxied_mask[] = {
   { ngx_string("off"), NGX_HTTP_GZIP_PROXIED_OFF },
   { ngx_string("expired"), NGX_HTTP_GZIP_PROXIED_EXPIRED },
   { ngx_string("no-cache"), NGX_HTTP_GZIP_PROXIED_NO_CACHE },
   { ngx_string("no-store"), NGX_HTTP_GZIP_PROXIED_NO_STORE },
   { ngx_string("private"), NGX_HTTP_GZIP_PROXIED_PRIVATE },
   { ngx_string("no_last_modified"), NGX_HTTP_GZIP_PROXIED_NO_LM },
   { ngx_string("no_etag"), NGX_HTTP_GZIP_PROXIED_NO_ETAG },
   { ngx_string("auth"), NGX_HTTP_GZIP_PROXIED_AUTH },
   { ngx_string("any"), NGX_HTTP_GZIP_PROXIED_ANY },
   { ngx_null_string, 0 }
};

followed closely by:

static ngx_str_t  ngx_http_gzip_no_cache = ngx_string("no-cache");
static ngx_str_t  ngx_http_gzip_no_store = ngx_string("no-store");
static ngx_str_t  ngx_http_gzip_private = ngx_string("private");

To those that stayed on topic, bravo !

+7  A: 

I can't answer your questions but always try to use a const string (or even a #define would be better) in such circumstances. The problem comes when you are refactoring code and change the value of one literal while forgetting the other (not so likely in your example as they are right next to each other but I have seen it before).

Whatever optomisations the compiler can do humans can still bugger it up :)

Patrick
+ a gazillion if I could.
Binary Worrier
That last line a million times, and the rest of it too! Redundancy screams 'insert error here'.
Kim Reece
-1 for going off-topic. this isn't a programming 101 quiz.
Hassan Syed
The fact that you are defending the practise because a master programmer uses it suggests a 101 class wouldn't go amiss. My answer is relevant because, if you accept its premise as true, your question becomes redundant.
Patrick
I'm not defending the practise. I am not going to to turn down or criticize this code just because he has used this particular construct. And no, the question does not become redundant -- but since your playing the role of the passionate pragmatic-programmer stepping in to try to derail this question -- I doubt you would see it.
Hassan Syed
Actually I was just trying to help, but hey.
Patrick
You help people by reading a question, and addressing the specifics of the question, it is infuriating to find people dolling out basic programming knowledge to every question they encounter. Perhaps the person scouring the internals of a complex open-source web-server written in C, and asking for details about compilers is aware of the dangers of hard-coding knowledge into program statements ?
Hassan Syed
Ahh, I wish I had the maturity to let this go but I don't :). I'm sure people scouring the internals of such complex code know that sizeof() on a constant is going to be optimised out yet this is your chosen answer; perhaps offering basic information can sometimes be useful even to an expert such as yourself, I assure you I didn't mean to question your expertise when I offered mine.
Patrick
I accepted the answer above based on the last sentence.... and before you had written this comment I had posted a more elaborate example that did **not** contain the sizeof operator. I recommend that you pay more attention to detail.
Hassan Syed
In the end I prefer to risk infuriating experts with advice they don't need, rather than not trying to help those who do
Patrick
Patrick I sincerely do not mean to offend, but you chose one of the following paths (1) Tunnel Vision: you saw an aspect that was largely irrelevant to the question and you jumped at it without any real contemplation about what the question was actually asking (2) You are farming reputation and know the voting patterns of the average SO reader. Long term knowledge needs to be succinct and effective, and by pitching in useless information to this question you have taken away from its long-term benefit.
Hassan Syed
That's fine, I'm happy to argue about most things and don't get offended easily. I think you've mixed up your definitions, tunnel vision would be concentrating purely on the question without considering potential wider issues; as for long term effectiveness just because you don't find something useful doesn't mean no one else will.
Patrick
+4  A: 

I would be very unhappy to see that pattern - what if someone changes one literal without changing the other? It should be pulled out; make a pretty little named constant.

Assuming you can't for some reason, or just to actually answer the question: (At least, anecdotally.)

I made a similar program in C and compiled it with GCC 4.4.3, the constant string appeared only once in the resulting executable.

Edit: Since it might be useful as an easy test, here is the code I tested it with...

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

main(){
    char *n = (char*)malloc(sizeof("teststring"));
    memcpy((void*)n, "teststring", sizeof("teststring"));
    printf("%s\n", n);
}

And here is the command I used to check how many times the string appeared...

strings a.out|grep teststring

But please please consider using less error-prone coding practices where possible.

Kim Reece
the compiler does this optimization for you. +1 for the command line for verification
Iulian Şerbănoiu
+3  A: 
  1. Yes for GCC, should be also true for others
  2. Maybe yes for GNU linker (see -fmerge-constants, -fmerge-all-constants)
  3. No
  4. Not sure
unbeli
+3  A: 

I wrote a small sample code and compiled:

void func (void)
{
    char ps1[128];
    char ps2[128];

    strcpy(ps1, "string_is_the_same");
    strcpy(ps2, "string_is_the_same");

    printf("", ps1, ps2);
}

As a result in assembler file there is only one instance of literal "string_is_the_same" even without optimization. However, not sure if these strings are not duplicated being placed into different files -> different object files.

Pmod
Oh, was a bit late..
Pmod
good answer anyhow :D
Hassan Syed
+4  A: 

Note that for the specific case of sizeof("</pre><hr>"), it is virtually certain that the string literal will never appear in the output file - the entire sizeof expression can be evaluated to the integer constant 11 at compile-time.

Notwithstanding, it is still a very common optimisation for compilers to merge identical string literals.

caf
How about linkers?
Pmod
@Pmod: I don't believe so, but I haven't played with GCC's new "link time optimization" feature yet.
caf
A `sizeof` expression is a compile-time constant (per C++ standard). So "virtually certain" seems an understatement.
MSalters
@MSalters: not always, at least in ISO C99. Think of VLAs. Won't somebody please think of VLAs?
ninjalj