views:

605

answers:

7

If you've been programming for a while then you probably noticed something completely impossible occurs every now and then for which you are convinced there is no possible explanation ("IT'S A COMPILER BUG!!"). After you find out what it was caused by though you are like "oooohhh".

Well, it just happened to me :(

Here AuthDb is NOT NULL but a valid pointer:

SingleResult sr(AuthDb, format("SELECT Id, Access, Flags, SessionKey, RealmSplitPreference FROM accounts WHERE Name = '%s'") % Escaped(account_name));

Here it mysteriously becomes NULL:

struct SingleResult : public BaseResult
{
    SingleResult(Database *db, const boost::format& query)  { _ExecuteQuery(db, query.str()); }
}

Notice that it's the immediate next call. It can be explained much better with two screenshots: http://img187.imageshack.us/img187/5757/ss1zm.png
http://img513.imageshack.us/img513/5610/ss2b.png

EDIT: AuthDb is a global variable. It itself keeps pointing to the right thing; but the copy of the ptr Database *db points at NULL.


ASM code (unfortunately I don't even know how to read it :/)

Of the first screenshot

01214E06  mov         eax,dword ptr [ebp-328h] 
01214E0C  push        eax  
01214E0D  push        offset string "SELECT Id, Access, Flags, Sessio"... (13C6278h) 
01214E12  lea         ecx,[ebp-150h] 
01214E18  call        boost::basic_format<char,std::char_traits<char>,std::allocator<char> >::basic_format<char,std::char_traits<char>,std::allocator<char> > (11A3260h) 
01214E1D  mov         dword ptr [ebp-32Ch],eax 
01214E23  mov         ecx,dword ptr [ebp-32Ch] 
01214E29  mov         dword ptr [ebp-330h],ecx 
01214E2F  mov         byte ptr [ebp-4],2 
01214E33  mov         ecx,dword ptr [ebp-330h] 
01214E39  call        boost::basic_format<char,std::char_traits<char>,std::allocator<char> >::operator%<Snow::Escaped> (11A3E18h) 
01214E3E  push        eax  
01214E3F  mov         edx,dword ptr [__tls_index (144EC40h)] 
01214E45  mov         eax,dword ptr fs:[0000002Ch] 
01214E4B  mov         ecx,dword ptr [eax+edx*4] 
01214E4E  mov         edx,dword ptr [ecx+12A3Ch] 
01214E54  push        edx  
01214E55  lea         ecx,[sr] 
01214E58  call        Snow::SingleResult::SingleResult (11A27D4h) 
01214E5D  mov         byte ptr [ebp-4],4 // VS GREEN ARROW IS HERE
01214E61  lea         ecx,[ebp-150h] 
01214E67  call        boost::basic_format<char,std::char_traits<char>,std::allocator<char> >::~basic_format<char,std::char_traits<char>,std::allocator<char> > (11A1DBBh) 
01214E6C  mov         byte ptr [ebp-4],5 
01214E70  lea         ecx,[ebp-170h] 
01214E76  call        Snow::Escaped::~Escaped (11A42D2h) 
    const bool account_found = !sr.Error();
01214E7B  lea         ecx,[sr] 
01214E7E  call        Snow::BaseResult::Error (11A2964h) 
01214E83  movzx       eax,al 
01214E86  test        eax,eax 
01214E88  sete        cl   
01214E8B  mov         byte ptr [account_found],cl 

    if (!account_found) {
01214E8E  movzx       edx,byte ptr [account_found] 
01214E92  test        edx,edx 
01214E94  jne         AuthSession+1C0h (1214F10h) 
        client.Kill(format("%s: Attempted to login with non existant account `%s'") % client % account_name, true);

Second screenshot

011A8E7D  mov         dword ptr [ebp-10h],ecx 
011A8E80  mov         ecx,dword ptr [this] 
011A8E83  call        Snow::BaseResult::BaseResult (11A31D9h) 
011A8E88  mov         dword ptr [ebp-4],0 
011A8E8F  lea         eax,[ebp-30h] 
011A8E92  push        eax  
011A8E93  mov         ecx,dword ptr [query] 
011A8E96  call        boost::basic_format<char,std::char_traits<char>,std::allocator<char> >::str (11A1E01h) 
011A8E9B  mov         dword ptr [ebp-34h],eax 
011A8E9E  mov         ecx,dword ptr [ebp-34h] 
011A8EA1  mov         dword ptr [ebp-38h],ecx 
011A8EA4  mov         byte ptr [ebp-4],1 
011A8EA8  mov         edx,dword ptr [ebp-38h] 
011A8EAB  push        edx  
011A8EAC  mov         eax,dword ptr [db] 
011A8EAF  push        eax  
011A8EB0  mov         ecx,dword ptr [this] 
011A8EB3  call        Snow::SingleResult::_ExecuteQuery (124F380h) 
011A8EB8  mov         byte ptr [ebp-4],0 // VS GREEN ARROW HERE
011A8EBC  lea         ecx,[ebp-30h] 
011A8EBF  call        std::basic_string<char,std::char_traits<char>,std::allocator<char> >::~basic_string<char,std::char_traits<char>,std::allocator<char> > (11A2C02h) 
011A8EC4  mov         dword ptr [ebp-4],0FFFFFFFFh 
011A8ECB  mov         eax,dword ptr [this] 
011A8ECE  mov         ecx,dword ptr [ebp-0Ch] 
011A8ED1  mov         dword ptr fs:[0],ecx 
011A8ED8  pop         edi  
011A8ED9  add         esp,38h 
011A8EDC  cmp         ebp,esp 
011A8EDE  call        _RTC_CheckEsp (12B4450h) 
011A8EE3  mov         esp,ebp 
011A8EE5  pop         ebp  
011A8EE6  ret         8    

UPDATE Following the suggestion of peterchen, I added ASSERT(AuthDb); here:

ASSERT(AuthDb);
SingleResult sr(AuthDb, format("SELECT Id, Access, Flags, SessionKey, RealmSplitPreference FROM accounts WHERE Name = '%s'") % Escaped(account_name));

And it failed O.o And yet the debugger keeps insisting that it's not NULL.. It's not shadowed by a local

UPDATE2* cout << AuthDb; is 0 in there even if the debugger says it's not


FOUND THE PROBLEM

Database *AuthDb = NULL, *GameDb = NULL; in a .cpp

extern thread Database *AuthDb, *GameDb; in a .h

The variable was marked thread (TLS - Thread local storage) in the header, but not TLS in the definition...

Countless hours wasted on this super stupid mistake, no warnings or hints or anything from the compiler that I feel like killing right now. :( Oh well, like I said for each impossible behavior there is a solution that once known seems obvious :)

Thanks to everyone who helped, I was really desperate!

+2  A: 

One possibility is that in the second screenshot, you have the debugger stopped at the very beginning of the function, before the stack has been manipulated, and so the variable locations aren't correct. You might also be after the end of the function, where the stack has already been torn down. I've seen that sort of thing before.

Expand that function to be a multiline function, so that it looks like this:

struct SingleResult : public BaseResult
{
    SingleResult(Database *db, const boost::format& query)  
    { 
        _ExecuteQuery(db, query.str()); 
    }
}

... and see if it still shows db as null when you have the debugger stopped on the _ExecuteQuery line.

Aric TenEyck
ASSERT(db); fails in _ExecuteQuery(). That's why I noticed this behavior; it's not a display issue, it's really null
Andreas Bonini
What is the declaration of AuthDb? It is a Database *, right?
Aric TenEyck
Also, in the screen shots, it looks like the exception has already happened. If you expand the code as per my suggestion above, and put a breakpoint on that line, is db null **before** the call to _ExecuteQuery?
Aric TenEyck
Yes Aric ...at least 15 chars...
Andreas Bonini
Yes, it is, I even tried to put an ASSERT() in there too and it failed as well.
Andreas Bonini
A: 

Well I'm not sure what classes/functions you're using here, but from a quick glance, shouldn't it be:

SingleResult sr(AuthDb, format("SELECT Id, Access, Flags, SessionKey, RealmSplitPreference FROM accounts WHERE Name = '%s'", Escaped(account_name)));

instead of:

SingleResult sr(AuthDb, format("SELECT Id, Access, Flags, SessionKey, RealmSplitPreference FROM accounts WHERE Name = '%s'") % Escaped(account_name));

It seems to me you're putting a modulus of the of Escaped(account_name) rather than passing that as an argument to format. However, I could just be confused.

Mike

mkgrunder
I didn't quite understand your last sentence, but mine is correct. You use % to pass arguments to format, and Escaped() is indeed passed to format. The formatted string is also correct (checked with debugger, thought maybe it was related)
Andreas Bonini
Ok. It has been a while since I've done much c++. You could also try setting a conditional breakpoint that automatically stops when the value of the pointer changes.
mkgrunder
Yes, i thought of that, problem is that it never changes: it's set to NULL, it's never set to its correct value as far as I know.
Andreas Bonini
A: 

There is likely a bug somewhere else in your program. I suggest you find the problem by looking elsewhere in your code.

strager
+1  A: 

I have no guess what is going on, but if I were debugging this I would be looking at the assembly language to see what is happening. You may need to get a better understanding of your platforms calling convention (i.e. how are arguments passed, on the stack, in registers, etc.) to troubleshoot this.

R Samuel Klatchko
I posted the ASM if you wanna take a look at it =p
Andreas Bonini
+2  A: 

Do you have a local AuthDB that is null and hides your global one?

(I'd expect the debugger in this case to correctly show you the local one... but with VS quality decay, I'd not rely on that)

I'd change the code to:

_ASSERTE(AuthDb);
SingleResult sr(AuthDb, format(...));

....

struct SingleResult : public BaseResult
{    
   SingleResult(Database *db, const boost::format& query)  
  { 
    _ASSERTE(db);
    _ExecuteQuery(db, query.str()); 
  }
}

And follow the disassembly in the debugger.

peterchen
Uhhh, thanks.. See my edit on the original question =p
Andreas Bonini
+7  A: 

Is AuthDB a thread-local variable?

Maybe the debugger isn't handling it correctly. What if you ASSERT(AuthDB) before the constructor is called?

UPDATE: If it is thread-local, it simply hasn't been initialized in this thread.

Jason Orendorff
I'd be willing to bet AuthDB is `__declspec(thread)`. Incidentally I never would have guessed that without looking at the assembly.
Jason Orendorff
Yes, I saw that __tls bit in the assembler code, too. That could be the problem.
Peter Stuifzand
Yes, it is... (I actually forgot it was otherwise I may have thought it was worth mentioning in my question).. However I don't know why it would behave like that; it's set in the Main Thread and it's the main thread that it's running that code.
Andreas Bonini
Found the problem.. Marking this as accepted since it's the answer that made me check :)
Andreas Bonini
uh.. what the problem was is described in my edited question
Andreas Bonini
Yay! Thanks for the fun question. It was a toolchain bug after all!
Jason Orendorff
And that's why the disassembler window is always my first stop in debugging.
Crashworks
A: 

Another possibility--you might be looking at a memory overwrite due to a wild pointer somewhere.

Assuming your debugger supports it when you hit the first line set a memory-write breakpoint on AuthDb.

Loren Pechtel