ansaurus

Question

Answer 1

+3 A:

Based on your edits, here's an improved version, with the same results.

Input:

struct C { 
    int myfrob;
    int frob();
    C(int f);
 };
C::C(int f) : myfrob(f) {}
int C::frob() { return myfrob; }

C& get() {
    static C *c = new C(5);
    return *c;
}

int main() {
    return get().frob(); // is compiler free to optimize out the call? 

}

Output:

; ModuleID = '/tmp/webcompile/_28088_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-linux-gnu"

%struct.C = type { i32 }

@guard variable for get()::c = internal global i64 0            ; <i64*> [#uses=4]

declare i32 @__cxa_guard_acquire(i64*) nounwind

declare i8* @operator new(unsigned long)(i64)

declare void @__cxa_guard_release(i64*) nounwind

declare i8* @llvm.eh.exception() nounwind readonly

declare i32 @llvm.eh.selector(i8*, i8*, ...) nounwind

declare void @__cxa_guard_abort(i64*) nounwind

declare i32 @__gxx_personality_v0(...)

declare void @_Unwind_Resume_or_Rethrow(i8*)

define i32 @main() {
entry:
  %0 = load i8* bitcast (i64* @guard variable for get()::c to i8*), align 8 ; <i8> [#uses=1]
  %1 = icmp eq i8 %0, 0                           ; <i1> [#uses=1]
  br i1 %1, label %bb.i, label %_Z3getv.exit

bb.i:                                             ; preds = %entry
  %2 = tail call i32 @__cxa_guard_acquire(i64* @guard variable for get()::c) nounwind ; <i32> [#uses=1]
  %3 = icmp eq i32 %2, 0                          ; <i1> [#uses=1]
  br i1 %3, label %_Z3getv.exit, label %bb1.i

bb1.i:                                            ; preds = %bb.i
  %4 = invoke i8* @operator new(unsigned long)(i64 4)
          to label %invcont.i unwind label %lpad.i ; <i8*> [#uses=2]

invcont.i:                                        ; preds = %bb1.i
  %5 = bitcast i8* %4 to %struct.C*               ; <%struct.C*> [#uses=1]
  %6 = bitcast i8* %4 to i32*                     ; <i32*> [#uses=1]
  store i32 5, i32* %6, align 4
  tail call void @__cxa_guard_release(i64* @guard variable for get()::c) nounwind
  br label %_Z3getv.exit

lpad.i:                                           ; preds = %bb1.i
  %eh_ptr.i = tail call i8* @llvm.eh.exception()  ; <i8*> [#uses=2]
  %eh_select12.i = tail call i32 (i8*, i8*, ...)* @llvm.eh.selector(i8* %eh_ptr.i, i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*), i8* null) ; <i32> [#uses=0]
  tail call void @__cxa_guard_abort(i64* @guard variable for get()::c) nounwind
  tail call void @_Unwind_Resume_or_Rethrow(i8* %eh_ptr.i)
  unreachable

_Z3getv.exit:                                     ; preds = %invcont.i, %bb.i, %entry
  %_ZZ3getvE1c.0 = phi %struct.C* [ null, %bb.i ], [ %5, %invcont.i ], [ null, %entry ] ; <%struct.C*> [#uses=1]
  %7 = getelementptr inbounds %struct.C* %_ZZ3getvE1c.0, i64 0, i32 0 ; <i32*> [#uses=1]
  %8 = load i32* %7, align 4                      ; <i32> [#uses=1]
  ret i32 %8
}

Noteworth, no code is emitted for ::get, but main still allocates ::get::c (at %4) with a guard variable as needed (at %2 and at the end of invcont.i and lpad.i). llvm here is inlining all of that stuff.

tl;dr: Don't worry about it, the optimizer normally gets this stuff right. Are you seeing an error?

TokenMacGuy 2010-09-20 05:16:50

Well there's no use of `C` struct in `main()` after the initialization, calling `get()` has no side effects apart from initialization and returning reference to `c` which you don't keep. So there's no possible case where optimizing that line out makes the code behave differently ... hard to blame the compiler. This is similar to the original question, except we don't know what code after the call does.

stefanB 2010-09-20 05:23:16

no error yet, but dont want to find out the hard way. thanks

aaa 2010-09-20 06:29:38

If an optimization changes the behavior of your code, its either because you're doing something that is undefined (you aren't in this case) or because the optimizer is broken.

TokenMacGuy 2010-09-20 17:26:02

Answer 2

+1 A:

Whether the compiler optimizes the function call or not is basically unspecified behavior as per the Standard. An unspecified behavior is basically a behavior which is chosen from a set of finite possibilities, but the choice may not be consistent every time. In this case, the choice is 'to optimize' or 'not', which the Standard does not specify and the implementation is also not supposed to document, as it is a choice which may not be consistently taken by a given implementation.

If the idea is just to 'touch', will it help if we just add a dummy volatile variable and dummy increment it in each call

e.g

C& getC(){
   volatile int dummy;
   dummy++;
   // rest of the code
}

Chubsdad 2010-09-20 05:17:17

How do you define "first call"? In any case, the function here is quite simple that it can be entirely optimized out.

casablanca 2010-09-20 05:19:10

I get your point. I will edit my response

Chubsdad 2010-09-20 05:23:23

thanks for voatile idea

aaa 2010-09-20 06:30:26

Sadly, Sutter mentions in one of his talks that a smart compiler would be able to discard the `volatile` qualifier in `dummy`. The rationale is that it can know for a fact that being on the stack it is not a variable that refers to special hardware. Also a pointer to the variable is not being passed to any other function, so the compiler can know for a fact that changes to `dummy` are only visible inside `getC`, and as such it could remove the `volatile`. After that if the compiler notices that the value is never used, it can completely remove the var. I don't know any compiler that does this.

David Rodríguez - dribeas 2010-09-20 08:21:00

@David Rodríguez - dribeas: well, I Just tried it all out in llvm. In one case, the volitile variable was removed, (leaving nothing more than a main { return 0 }, but in the other case, the volitile with increment was inlined into main with the rest of getC. I think this goes to prove your point that you just can't know what's gonna happen!

TokenMacGuy 2010-09-21 09:14:36

Answer 3

+4 A:

The C and C++ standards operate under a rather simple principle generally known as the "as-if rule" -- basically, that the compiler is free to do almost anything as long as no conforming code can discern the difference between what it did and what was officially required.

I don't see a way for conforming code to discern whether get was actually called in this case, so it looks to me like it's free to optimize it out.

Jerry Coffin 2010-09-20 05:19:21

I don't understand how the compiler could know that `get` doesn't perform any side effect... and thus how it could decide to optimize it. Unless `get` is pure (and how would it know about it ?), there is no reason not to execute it... is it ?

Matthieu M. 2010-09-20 07:06:13

@Matthieu: The C standard defines a side effect as modifying a volatile variable or calling a library function. It's pretty easy for a compiler to figure out that `get` does neither.

Jerry Coffin 2010-09-20 07:09:17

@Jerry: if the definition of get is visible within the translation unit, yet it is, but if it is defined within another translation unit, would this be subject to LTO ? I doubt it, but I don't know much about LTO yet.

Matthieu M. 2010-09-20 07:33:34

@Matthieu: Hard to say -- from the viewpoint of the standard, there's no real separation between LTO and other optimization. That said, my guess would be that what usually gets called LTO wouldn't typically do this, but what gets called LTCG might.

Jerry Coffin 2010-09-28 03:24:18

ansaurus

tags:

views:

answers:

C++ static initialization

related questions