views:

1005

answers:

16

I'm writing a C++ program that doesn't work (I get a segmentation fault) when I compile it with optimizations (options -O1, -O2, -O3, etc.), but it works just fine when I compile it without optimizations.

Is there any chance that the error is in my code? or should I assume that this is a bug in GCC?

My GCC version is 3.4.6.

Is there any known workaround for this kind of problem?

There is a big difference in speed between the optimized and unoptimized version of my program, so I really need to use optimizations.


This is my original functor. The one that works fine with no levels of optimizations and throws a segmentation fault with any level of optimization:

struct distanceToPointSort{
    indexedDocument* point ;
    distanceToPointSort(indexedDocument* p): point(p) {}
    bool operator() (indexedDocument* p1,indexedDocument* p2){
        return distance(point,p1) < distance(point,p2) ;
    }
} ;

And this one works flawlessly with any level of optimization:

struct distanceToPointSort{
    indexedDocument* point ;
    distanceToPointSort(indexedDocument* p): point(p) {}
    bool operator() (indexedDocument* p1,indexedDocument* p2){

        float d1=distance(point,p1) ;
        float d2=distance(point,p2) ;

        std::cout << "" ;  //without this line, I get a segmentation fault anyways

        return d1 < d2 ;
    }
} ;

Unfortunately, this problem is hard to reproduce because it happens with some specific values. I get the segmentation fault upon sorting just one out of more than a thousand vectors, so it really depends on the specific combination of values each vector has.

+3  A: 

You may be running into an aliasing problem (or it could be a million other things). Look up the -fstrict-aliasing option.

This kind of question is impossible to answer properly without more information.

Patrick_O
-fstrict-aliasing is disabled by default
Adam Rosenfield
It's enabled starting with -O2 for gcc 4.3
Johannes Schaub - litb
+4  A: 

The error is in your code. It's likely you're doing something that invokes undefined behavior according to the C standard which just happens to work with no optimizations, but when GCC makes certain assumptions for performing its optimizations, the code breaks when those assumptions aren't true. Make sure to compile with the -Wall option, and the -Wextra might also be a good idea, and see if you get any warnings. You could also try -ansi or -pedantic, but those are likely to result in false positives.

Adam Rosenfield
Brave call ("error is in your code") but you're probably right. Also, if it's illegal to the C standard, it should be caught at compile time, not dump core at run time.
paxdiablo
"Also, if it's illegal to the C standard, it should be caught at compile time" -- read an int from stdin, divide by the int, and predict at compile time that the user will type a zero? Hello?
Windows programmer
Dividing by 0 doesn't break the standard...
Greg Rogers
That's exactly the problem - it's legal but yields undefined behaviour. The former means the compiler doesn't have to complain. The latter is something where people sometimes confuse their terminology, and say "illegal" when they mean "invalid" or "borked".
Steve Jessop
@onebyone: Well, "legal" isn't mentioned at all in the standard, so your terminology is no better. If we have to be pedantic, we should just avoid usage of "legal" and "illegal" entirely. I tend to use "illegal" as a catch-all for "ill-formed or relying on undefined behavior", but there's no official definition as far as C++ goes.
jalf
+6  A: 

I would assume your code is wrong first.
Though it is hard to tell.

Does your code compile with 0 warnings?

 g++ -Wall -Wextra -pedantic -ansi
Martin York
+2  A: 

It's almost (almost) never the compiler.

First, make sure you're compiling warning-free, with -Wall.

If that didn't give you a "eureka" moment, attach a debugger to the least optimized version of your executable that crashes and see what it's doing and where it goes.

5 will get you 10 that you've fixed the problem by this point.

+2  A: 

Ran into the same problem a few days ago, in my case it was aliasing. And GCC does it differently, but not wrongly, when compared to other compilers. GCC has become what some might call a rules-lawyer of the C++ standard, and their implementation is correct, but you also have to be really correct in you C++, or it'll over optimize somethings, which is a pain. But you get speed, so can't complain.

Robert Gould
A: 

Wow, I didn't expect answers so quicly, and so many...

The error occurs upon sorting a std::vector of pointers using std::sort()

I provide the strict-weak-ordering functor.

But I know the functor I provide is correct because I've used it a lot and it works fine.

Plus, the error cannot be some invalid pointer in the vector becasue the error occurs just when I sort the vector. If I iterate through the vector without applying std::sort first, the program works fine.

I just used GDB to try to find out what's going on. The error occurs when std::sort invoke my functor. Aparently std::sort is passing an invalid pointer to my functor. (of course this happens with the optimized version only, any level of optimization -O, -O2, -O3)

GetFree
Can you provide the smallest, complete, program that has this problem?
Robert Gamble
I'd like to see that too...
Jason Coco
I'd like to see that too. Something is very odd here, because the implementation of std::sort should be independent of -O levels.
Windows programmer
+1  A: 

I expect to get some downvotes here after reading some of the comments, but in the console game programming world, it's rather common knowledge that the higher optimization levels can sometimes generate incorrect code in weird edge cases. It might very well be that edge cases can be fixed with subtle changes to the code, though.

Jim Buck
No downvote, but you will get the same comment I made in Artelius's answer - if the compiler generates bad code at -O3, odds are still good that your code is invalid, not that there's a bug in the compiler. But it is very easy to write invalid C/C++ code.
Steve Jessop
Maybe, but if a ton of game developers couldn't figure out the bug in the code, and the code works 100% fine otherwise, then maybe it's not technically a bug, but I would say de facto a bug.
Jim Buck
+3  A: 

It is very seldom the compiler fault, but compiler do have bugs in them, and them often manifest themselves at different optimization levels (if there is a bug in an optimization pass, for example).

In general when reporting programming problems: provide a minimal code sample to demonstrate the issue, such that people can just save the code to a file, compile and run it. Make it as easy as possible to reproduce your problem.

Also, try different versions of GCC (compiling your own GCC is very easy, especially on Linux). If possible, try with another compiler. Intel C has a compiler which is more or less GCC compatible (and free for non-commercial use, I think). This will help pinpointing the problem.

JesperE
+6  A: 

Here's some code that seems to work, until you hit -O3...

#include <stdio.h>

int main()
{
    int i = 0, j = 1, k = 2;
    printf("%d %d %d\n", *(&j-1), *(&j), *(&j+1));
    return 0;
}

Without optimisations, I get "2 1 0"; with optimisations I get "40 1 2293680". Why? Because i and k got optimised out!

But I was taking the address of j and going out of the memory region allocated to j. That's not allowed by the standard. It's most likely that your problem is caused by a similar deviation from the standard.

I find valgrind is often helpful at times like these.

EDIT: Some commenters are under the impression that the standard allows arbitrary pointer arithmetic. It does not. Remember that some architectures have funny addressing schemes, alignment may be important, and you may get problems if you overflow certain registers!

The words of the [draft] standard, on adding/subtracting an integer to/from a pointer (emphasis added):

"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."

Seeing as &j doesn't even point to an array object, &j-1 and &j+1 can hardly point to part of the same array object. So simply evaluating &j+1 (let alone dereferencing it) is undefined behaviour.

On x86 we can be pretty confident that adding one to a pointer is fairly safe and just takes us to the next memory location. In the code above, the problem occurs when we make assumptions about what that memory contains, which of course the standard doesn't go near.

Artelius
Please edit that to "Here's some bad code that accidentally works at levels less than -O3". Sure you clarified it later, but your first line misleads readers of this discussion.
Windows programmer
Any code which fails at -O3 is bad code that accidentally works below -O3. Unless it really is down to a compiler bug, of course. This example just makes it particularly blatant that the code is bad. I assume that's deliberate to illustrate the point.
Steve Jessop
Actually, that is allowed by the standard. You simply create a pointer that points to j then subtract/add 1 to the pointer and dereference it. That's perfectly legal, the *standard* permits you to attempt to read any pointer you like.
paxdiablo
It's certainly dangerous code but not illegal :-).
paxdiablo
where does the standard tell that k is exactly four bytes after j and not 8?
bene
Windows programmer
Perhaps Pax Diablo is merely being particular in his language - this code is a well-formed program which yields undefined behaviour. I don't think the standard defines "legal", "illegal", or "allowed", but it does define "well-formed" and "ill-formed", and this is well-formed.
Steve Jessop
... which doesn't necessarily mean it will compile, since I believe compilers are permitted to save you from yourself by refusing to compile code which causes undefined behaviour, if they can detect it. They're also allowed to define undefined behaviour, making particular cases non-dangerous.
Steve Jessop
@Pax: "attempting to read any pointer you like" is undefined behavior. Even just *creating* a pointer to some arbitrary address is undefined behavior.
jalf
A: 

as other have pointed out, probably strict aliasing. turn it of in o3 and try again. My guess is that you are doing some pointer tricks in your functor (fast float as int compare? object type in lower 2 bits?) that fail across inlining template functions. warnings do not help to catch this case. "if the compiler could detect all strict aliasing problems it could just as well avoid them" just changing an unrelated line of code may make the problem appear or go away as it changes register allocation.

starmole
A: 

As the updated question will show ;) , the problem exists with a std::vector<T*>. One common error with vectors is reserve()ing what should have been resize()d. As a result, you'd be writing outside array bounds. An optimizer may discard those writes.

MSalters
+1  A: 

Alright... This is one of the weirdest problems I've ever had.
I dont think I have enough proof to state it's a GCC bug, but honestly... It really looks like one.

This is my original functor. The one that works fine with no levels of optimizations and throws a segmentation fault with any level of optimization:

struct distanceToPointSort{
    indexedDocument* point ;
    distanceToPointSort(indexedDocument* p): point(p) {}
    bool operator() (indexedDocument* p1,indexedDocument* p2){
        return distance(point,p1) < distance(point,p2) ;
    }
} ;

And this one works flawlessly with any level of optimization:

struct distanceToPointSort{
    indexedDocument* point ;
    distanceToPointSort(indexedDocument* p): point(p) {}
    bool operator() (indexedDocument* p1,indexedDocument* p2){

        float d1=distance(point,p1) ;
        float d2=distance(point,p2) ;

        std::cout << "" ;  //without this line, I get a segmentation fault anyways

        return d1 < d2 ;
    }
} ;

Unfortunately, this problem is hard to reproduce because it happens with some specific values. I get the segmentation fault upon sorting just one out of more than a thousand vectors, so it really depends on the specific combination of values each vector has.

GetFree
I think I see the problem. If I recall correctly the IEEE floating point standard allows implementations to retain any random amount of precision greater than the required minimum, and the extra amount of precision doesn't have to be constant. Your d1 and d2 are sometimes equal, sometimes not.
Windows programmer
That what I thought. It's like my functor was giving inconsistent ordering sementics so std::sort gets confuzed or something.Even so, why does the problem happen with optimizations only?And why my second functor works fine?
GetFree
Others have asked, and I will repeat, please give a complete program (as small as possible) which causes you trouble. Hard code inputs if need be.
ejgottl
It looks like this sample was enough.
Windows programmer
"why does the problem happen with optimizations only?" -- there is no reliable answer, but a possible answer is that sometimes values were kept in registers with extra precision and sometimes values were rounded and stored into memory.
Windows programmer
A possible pitfall: two points equidistant from the test point might (due to rounding errors, say) return true for BOTH orderings of the arguments, because reversing the order might affect the calculations.
Artelius
+2  A: 

As an experiment, try to see if this will force the compiler to round everything consistently.

volatile float d1=distance(point,p1) ;
volatile float d2=distance(point,p2) ;
return d1 < d2 ;
Windows programmer
Yep. It worked. No segmentation fault so far.
GetFree
What an ugly solution, but I'm glad there's a solution.
Windows programmer
I'm still curious about why outputing an empty string (i.e. doing something between the calculation of the distances and returning the comparison) solved the problem.
GetFree
Maybe the function call forced gcc to store d1 and d2 into memory instead of keeping them in registers. So the rounding would become consistent.
Windows programmer
So, probably your solution and mine are doing the same thing, i.e. preventing the compiler to optimize the code. May be, without the volatile quialifier (or a function call in the middle) GCC was convertig the code to something similar to my first functor.
GetFree
Try with -ffloat-store instead of volatile (see my answer for the details).
CesarB
A: 

post the code in distance! it probably does some pointer magic, see my previous post. doing an intermediate assignment just hides the bug in your code by changing register allocation. even more telling of this is the output changing things!

starmole
+3  A: 

Now that you posted the code fragment and a working workaround was found (@Windows programmer's answer), I can say that perhaps what you are looking for is -ffloat-store.

-ffloat-store

Do not store floating point variables in registers, and inhibit other options that might change whether a floating point value is taken from a register or memory.

This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables.

Source: http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Optimize-Options.html

CesarB
A: 

The true answer is hidden somewhere inside all the comments in this thread. First of all: it is not a bug in the compiler.

The problem has to do with floating point precision. distanceToPointSort should be a function that should never return true for both the arguments (a,b) and (b,a), but that is exactly what can happen when the compiler decides to use higher precision for some data paths. The problem is especially likely on, but by no means limited to, x86 without -mfpmath=sse. If the comparator behaves that way, the sort function can become confused, and the segmentation fault is not surprising.

I consider -ffloat-store the best solution here (already suggested by CesarB).

Ringding