views:

416

answers:

6

Hi,

I'm currently working on a quite big (and old, sigh) code base, recently upgraded to VS2005 (SP1). Me and my team are changing/updating/replacing modules in this code as we go but we have occasionally been running into problems where the vtables seems broken. I am no expert on vtables but these sure seems to be broken. The errors manifests itself with this error:

Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.

Of course there can be plenty of other reasons for this error but when debugging (Debug build) I can actually verify that the vtables for the object I want to operate on look strange:

The stack and heap that reference each vtable looks fine and the pointers to the vtables match perfectly to the map file. This indicates to me that this is not a memory overwriting bug or similar, since then it would affect the stack and heap rather than where the vtables are stored. (They are stored in a read only area right?) Anyway, all seems good so far. But when looking at the memory of the vtable I find that all values, if I interpret them as pointers, although they are in the same range (Eg. 0x00f203db 0x00f0f9be 0x00ecdda7 0x00f171e1) does not match any entry in the map file and many of them are not even aligned to 4 bytes. I don't know all the details of how VS2005 builds the vtables, but this looks wrong to me. If this is correct behavior, perhaps somebody can explain this to me?

I guess my question boils down to what can cause this behavior? Is there any know bugs in the linker when having too complex class hierarchies for example? Has anybody seen anything similar before? Currently we are able to get around our crashes by moving functions from the affected class to inline (scary stuff!) but clearly this is not a feasible long term solution.

Thanks for any insight!

Update: I've been asked for more details about the project and of course I will supply this. First however, the question is not entirely related to the ESP value not being saved error. What I am most interested in is why I see the strange values in the vtable. That said, here is some additional info: The solution relies on several external and internal projects but these have not been changed in a long time, all uses the same calling convention. The code where it seems to break is all within the one pretty standard C++ "main" project of the solution. All code is built with the same compiler. The solution also doesn't use any dlls but links with plenty of static libraries:

SHFolder.lib, python25.lib, dxguid.lib, d3d9.lib, d3dx9.lib, dinput8.lib, ddraw.lib, dxerr9.lib, ws2_32.lib, mss32.lib, Winmm.lib, vtuneapi.lib, vttriggers.lib, DbgHelp.lib, kernel32.lib, user32.lib, gdi32.lib, winspool.lib, comdlg32.lib, advapi32.lib, shell32.lib, ole32.lib, oleaut32.lib, uuid.lib, odbc32.lib, odbccp32.lib

+1  A: 

I think the big hint here is in the "This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention" part of that error. It seems to me that there is a mismatch between the caller's API and the library which is handling the call.

Also, it might be the case that you are mixing code built with different compilers. What more can you tell us about the nature of this project? Is the function you are calling located in an external library? Or can you debug through the entire call stack?

Edit: You said that the project doesn't use any DLL's. What about static libraries?

Nik Reiman
Hi and thanks! Added some more info to the question above.
Dan
bwah... I'm stumped then, as your build environment seems pretty sane. Let me think about this a bit and come back with some new suggestions/questions for you.
Nik Reiman
Hehe, yeah I'm also drawing a blank here, except the odd vtables I'm seeing. Thanks for any ideas you might come up with :)
Dan
A: 

Whenever I've had a message like this, the answer has always involved recompiling some part or all of the code. I'd try a full rebuild as a first step. Sqook's suggestion about an external library also sounds plausible, and again would involve you recompiling that library with the same calling conventions as your main code, if that was possible.

I have sometimes found that the Build command can miss files that need to be recompiled, which can lead to your message. Again, a full rebuild will straighten things out.

Charles Anderson
Hi and thanks for your suggestion. I've also seen this sometimes being resolved by a rebuild but this doesn't seem to be the case this time. We have re-built, including all libraries on several machines and gets the same behavior everywhare.
Dan
A: 

When I've had this error before it's always been when COM has been involved. Nearly always it's been specifically related to re-entrancy - are you using COM? Are you using STA, message filters?

morechilli
Hi and thanks. No COM is used. Not familiar with STA and message filters but to be honest, most of the coda and especially the code that breaks is actually really simple C++ although operating on some fairly complex classes but it's far from the worst I've seen.
Dan
+1  A: 

Beware of the effects that incremental linking and Edit+Continue will have on function addresses, including v-table entries. It works by making method calls indirectly through a jump table. That allows the linker to patch the jump table when it needs to relocate the method without having the relink the entire image. The addresses in that jump table are 5 bytes apart. They won't appear in the .map file. It is really easy to see when you switch to Assembly view and trace execution of the call.

Which is also the technique you should use to diagnose the RTC failure. Find out what method is actually getting called. The most likely reason for this is that you've added virtual methods to a class but a client of that class wasn't recompiled. Using the wrong slot in the v-table. Classically also a COM problem when changing interfaces but not IIDs.

Hans Passant
Hey, thanks for your reply! In fact I used your technique by switching to assembly view to see what was going on. Had no idea that x86 could have jump tables not aligned to even 4 bytes, so this initially confused me a lot. I'm used to ARM assembly before :)
Dan
+2  A: 

I found the problem. Silly really but the class hierarchy that caused the problem had a virtual function called GetObject which conflicted with the windows #define with the same name. The header files included these windows header files in different order, which confused the linker. So, in fact the problem was corrupted vtables, but I didn't expect this to be reason! Well you learn something every day...

However, big thanks to all that replied!

Dan
A: 

I had exactly the same problem - calling an overloaded virtual function on an object resulted in the "ESP was not properly saved" error, but debugging showed that the compiler had generated a wrong offset into the vtable for this call, so another function with more parameters was getting called. The called function updated the ESP as if the caller had pushed more parameters on the stack, which in turn resulted in an invalid ESP value on return.

The problem disappeared after I put the header files including the class at fault at the top of the source file. I haven't investigated further what exactly caused this, but I guess it was for the same situation - some define messing with the declaration of a virtual member.

Hope that helps others that stumble into the same problem.

Neno Ganchev