tags:

views:

53

answers:

2

I have a one-liner C function that is just return value * pow(1.+rate, -delay); - it discounts a future value to a present value. The interesting part of the disassembly is

0x080555b9 :      neg    %eax
0x080555bb :      push   %eax
0x080555bc :      fildl  (%esp)
0x080555bf :      lea    0x4(%esp),%esp
0x080555c3 :      fldl   0xfffffff0(%ebp)
0x080555c6 :      fld1   
0x080555c8 :      faddp  %st,%st(1)
0x080555ca :      fxch   %st(1)
0x080555cc :      fstpl  0x8(%esp)
0x080555d0 :      fstpl  (%esp)
0x080555d3 :      call   0x8051ce0 
0x080555d8 :      fmull  0xfffffff8(%ebp)

While single-stepping through this function, gdb says (rate is 0.02, delay is 2; you can see them on the stack):


(gdb) si
0x080555c6      30        return value * pow(1.+rate, -delay);
(gdb) info float
  R7: Valid   0x4004a6c28f5c28f5c000 +41.68999999999999773      
  R6: Valid   0x4004e15c28f5c28f6000 +56.34000000000000341      
  R5: Valid   0x4004dceb851eb851e800 +55.22999999999999687      
  R4: Valid   0xc0008000000000000000 -2                         
=>R3: Valid   0x3ff9a3d70a3d70a3d800 +0.02000000000000000042    
  R2: Valid   0x4004ff147ae147ae1800 +63.77000000000000313      
  R1: Valid   0x4004e17ae147ae147800 +56.36999999999999744      
  R0: Valid   0x4004efb851eb851eb800 +59.92999999999999972      

Status Word:         0x1861   IE             PE        SF              
                       TOP: 3
Control Word:        0x037f   IM DM ZM OM UM PM
                       PC: Extended Precision (64-bits)
                       RC: Round to nearest
Tag Word:            0x0000
Instruction Pointer: 0x73:0x080555c3
Operand Pointer:     0x7b:0xbff41d78
Opcode:              0xdd45

And after the fld1:

(gdb) si
0x080555c8      30        return value * pow(1.+rate, -delay);
(gdb) info float
  R7: Valid   0x4004a6c28f5c28f5c000 +41.68999999999999773      
  R6: Valid   0x4004e15c28f5c28f6000 +56.34000000000000341      
  R5: Valid   0x4004dceb851eb851e800 +55.22999999999999687      
  R4: Valid   0xc0008000000000000000 -2                         
  R3: Valid   0x3ff9a3d70a3d70a3d800 +0.02000000000000000042    
=>R2: Special 0xffffc000000000000000 Real Indefinite (QNaN)
  R1: Valid   0x4004e17ae147ae147800 +56.36999999999999744      
  R0: Valid   0x4004efb851eb851eb800 +59.92999999999999972      

Status Word:         0x1261   IE             PE        SF      C1      
                       TOP: 2
Control Word:        0x037f   IM DM ZM OM UM PM
                       PC: Extended Precision (64-bits)
                       RC: Round to nearest
Tag Word:            0x0020
Instruction Pointer: 0x73:0x080555c6
Operand Pointer:     0x7b:0xbff41d78
Opcode:              0xd9e8

After this, everything goes to hell. Things get grossly over or undervalued, so even if there were no other bugs in my freeciv AI attempt, it would choose all the wrong strategies. Like sending the whole army to the arctic. (Sigh, if only I were getting that far.)

I must be missing something obvious, or getting blinded by something, because I can't believe that fld1 should ever possibly fail. Even less that it should fail only after a handful of passes through this function. On earlier passes the FPU correctly loads 1 into ST(0). The bytes at 0x080555c6 definitely encode fld1 - checked with x/... on the running process.

What gives?

+3  A: 

It looks like you have an FPU stack overflow. The FPU tag word is 0, which means that all registers are used. You can also see all registers marked as "valid", when I would expect some to be empty.

I don't know why this would happen though. Maybe you have some MMX code which doesn't issue the EMMS instruction? Or maybe some inline assembly which doesn't clear the stack properly?

interjay
+2  A: 

Remarkably appropriate. What you have here is a stack overflow.

Specifically, you (or possibly your compiler) has overflowed the x87 stack. It can only hold 8 values, and at the time that the fld1 is issued, it is already full (indicated by the tag word of 0000). Thus, the fld1 overflows the stack (indicated by IE, SF, C1) which causes the result that you're seeing.

As to why this is happening, you may have used MMX instructions without using an EMMS before using the x87 instructions, or your compiler has a bug, or you have assembly code somewhere that violates your platform's ABI (or a library that you are using violates the ABI).

Stephen Canon
Ah, you guys rock! Thanks! It was driving me up the wall. As to why it happens: probably because I had moved a function calling another that returns `double` from one file to another, and in the new file, there was no prototype for the floating-point function, so I guess the compiler never popped the return value off the stack. I would have expected to run out of FP stack much sooner though, but now it seems to be working much better now that I've provided a prototype.Now I can move on to the next segfault!
Bernd Jendrissek
@Bernd: Yes, that can certainly cause this sort of problem.
Stephen Canon