views:

594

answers:

4

I have just ported several of our home-made Outlook COM-addins from Delphi 2007 to Delphi 2009 and am now experiencing some really weird errors (before you ask: none of which appear to have any obvious relationship to string-handling), for example modal dialogs that hang Outlook when one tries to invoke them a second time (the first time around everything appears to be fine) but only when they're invoked from one specific event handler and not when doing the same thing somewhere else. When I trace the error to a specific line of code and comment out that line or replace it with different code to the same effect (e.g. by copying code that would otherwise be called via a function directly to the calling site), the error will appear to go away - typically only to reoccur a couple of (equally inconspicuous looking) statements later.

When running this inside the Delphi debugger I can see that the freezes are often preceded by Access Violations in GetMem.inc . At least all of these issues are 100% reproducible...

Needless to say we had none of these issues when compiling these addins in Delphi 2007.

Now, I'm quite at a loss. I know I have just been lucky but even though I consider myself a fairly experienced programmer (though mostly in niche areas) I never really had to deal with this class of error before. As the title of this question says, I don't even really know where to start. I can step through the code as much as I like but the endless assembler statements mean nothing to me and neither am I proficient in effectively using the CPU view.

Furthermore, I don't even know for sure yet whether this is an issue with my own code to begin with (I actually tend to doubt it in this case). We are makign massive use of a number of third-party libraries (e.g. JCL, ADX, Redemption). ADX in particular still labels its Delphi 2009 support "beta".

I also tried using FastMM's FullDebugMode and indeed I did uncover a number of errors in ADX that way (e.g. blocks that were modified after having been freed) but all of these also occur when I compile with Delphi 2007 so it doesn't yet seem imperative that these are ultimately the cause for the observed regression.

So, how do I deal with this? - or better yet: Where can I find some good resources on learning how to deal with this? e.g. tutorials on using the CPU view or effectively interpreting and acting upon the reports put out by FastMM? Are these the correct tools at all? Where else should I look?

Addendum:
What types of code should I be suspicious of in this context? What kind of code even has the potential to wreak such havoc in memory? The only places I can think of where my code performs anything remotely approaching explicit memory manipulation is when reserving some buffer space in preparation of a WinAPI call. Also keep in mind that all of my code is identical between the Delphi 2007 and Delphi 2009 versions and the Delphi 2007 version exhibits no such problems.

Update:
With some probability the issue that prompted me to post this question has now been solved. See my own answer below.

+4  A: 

The best tool for getting to a solution is probably memory breakpoints.

Debugging memory corruption is painful, so try to make your life as simple as possible first: find an exact, guaranteed-reproducible set of steps that work every time. If necessary, mock up the Outlook host so that you don't need to rely on Outlook timing issues or address space layout issues etc.

It's imperative that you get a reliably reproducible set of steps that results in an AV or other error at a predictable address.

What you then do is restart the process, create a memory breakpoint set for whatever referred to that address, and get familiar with the lifecycle of that chunk of memory. Minmizing and rationalizing your reproduction steps helps here. It may help to add other breakpoints and only enable the memory breakpoint later in the application; or use the logging features of D2009 breakpoints to log memory values / call stacks etc., rather than actually breaking into the debuggee.

Barry Kelly
Thanks that sounds promising. In at least one case I do have a predictable AV at a predictable address (predictable in that is always the same). Should I set the breakpoint at the address where the AV occurs (i.e. line 1654 in GetMem.inc) or the address that was attempted to be written to? (tbc)
Oliver Giesen
(ctd) If I set it to the latter, the AV is raised before the breakpoint triggers...
Oliver Giesen
BP the thing that is referring to the inaccessible address
Barry Kelly
Thanks again! I think I've found the real problem now. See my own answer to this question.
Oliver Giesen
A: 
+2  A: 

The fact that you catch double-free bugs in D2007, even though it does appear to work fine in this version, means that you NEED to fix those because you are merely lucky that the D2007 version does not need to recycle the memory as aggressively as the D2009 version and the bugs do not show up due to "shadow persistence" in memory.
I would use FastMM fulldebugmode to find the bad code and fix it as much as possible, then follow Barry's advice to trouble shoot memory usage.
For how to use the features of the Integrated Debugger, and how to log info from non breaking breakpoints, you may want to look at this CodeRage 3 session: Delphi Debugging for Dummies

François
The problem is that the errors reported by FastMM do not occur in my own code but rather inside the ADX library (I do have the source code though). AFAICT it's not double-frees but modifications to already freed blocks. I also got a "corrupted block footer" at some point.
Oliver Giesen
Fix it! ...or don't use. Any kind of code relying on memory persistence after it has been formally freed is bad. FastMM is very good at pointing this out. And it's about the same if it's a double-free or using properties of a freed object...
François
Thanks again! I think I've found the real problem now. After I fixed this, the errors in ADX went away as well. See my own answer to this question.
Oliver Giesen
+3  A: 

Not exactly an answer to the question which was more general, but very probably the solution to the specific problem that prompted it:

I am 95% sure to have identified the problem now! :)

Here's what I did:

  • I enabled RangeChecking and OverflowChecking in the compiler
  • I tracked down and fixed all problems that caused ERangeError or EIntOverflow exceptions
    (there was one of each)
  • I ran the program again with FastMM and FullDebugMode enabled
  • I was finally able to identify the cause of the problem in all cases to be a call to the JCL function GetWindowCaption

It seems that GetWindowCaption has obviously not yet been checked for Unicode-compatibility: It was using the value returned from the API function GetWindowTextLength (which returns the number of characters) as input for ReallocMem (which expects the number of bytes) to allocate the buffer for GetWindowText (which in Delphi 2009 returns a buffer of WideChars). Boom! The function was allocating too little memory for the buffer but GetWindowText simply overwrote the following memory thus corrupting the block footer.

I have now filed this in the JCL bug tracker as item #4648

The bottom line I took out of this is: Always be sure to fix all reported errors! Including (seemingly) non-critical ones like range and overflow errors. If nothing else, it will make debugging that much more predictable.

Oliver Giesen
Congrats! And thanks for the update and the bug report.
François
I'd hardly call a range or overflow error "non-critical". Unless you're intentionally doing wraparound math, an overflow will silently corrupt your data just about every time. Never *ever* turn off range or bounds checking unless you absolutely need the speed and you're CERTAIN it's safe already.
Mason Wheeler
I agree: Always fix range and overflow errors (leave checking on) and treat all warnings as errors (a great feature of D2009). Then explicitly ignore/allow range and overflow or other warning conditions in the allowable situations.
Jim McKeeth