views:

262

answers:

7

i have an interesting problem in my delphi 2009 app. when run in the debugger, i get an AV between the subroutine's Begin keyword and the first statement. i believe that's when it's setting up local variables. here's the information shown in the debugger:

uDeviceModule.pas.940: begin  // _GetMeasurementsForChannel
00AF24C8 55               push ebp
00AF24C9 8BEC             mov ebp,esp
00AF24CB 51               push ecx
00AF24CC B9E9A90100       mov ecx,$0001a9e9    // isn't this a lot for the stack?

// error happens in here
00AF24D1 6A00             push $00
00AF24D3 6A00             push $00
00AF24D5 49               dec ecx
00AF24D6 75F9             jnz $00af24d1

00AF24D8 874DFC           xchg [ebp-$04],ecx
00AF24DB 53               push ebx
00AF24DC 894DF4           mov [ebp-$0c],ecx
00AF24DF 8955FC           mov [ebp-$04],edx
00AF24E2 8945F8           mov [ebp-$08],eax
00AF24E5 33C0             xor eax,eax
00AF24E7 55               push ebp
00AF24E8 687D2FAF00       push $00af2f7d
00AF24ED 64FF30           push dword ptr fs:[eax]
00AF24F0 648920           mov fs:[eax],esp
uDeviceModule.pas.941: SelectChannel(eChannelNum);       // first statement

this is a simplified version of this nested subroutine (see below).

procedure TDeviceModule.GetMeasurements(ExpInfo:TExpInfo;
  _DisplayList:TMeasDisplayListAncestor; eExposureStatus:TExposureStatus;
  bActiveErrorEnabled:boolean);

  procedure _GetMeasurementsForChannel(_DisplayList:TObjectList;
    eChannelNum:TDeviceChannelNum; eExposureStatus:TMyEnum;
    bActiveErrorEnabled:boolean);
  var
    // these are all objects (not records)
    selChannel:TDeviceChannel;
    det:TDeviceDetector;
    shoKVMeas:TStoMeasurement;
  begin  // ********************* error happens on this line
    SelectChannel(eChannelNum);

    _GetMeasurement(ExpInfo, _DisplayList, eChannelNum, eExposureStatus, ctdVal1);
    _GetMeasurement(ExpInfo, _DisplayList, eChannelNum, eExposureStatus, ctdVal2);
    _GetMeasurement(ExpInfo, _DisplayList, eChannelNum, eExposureStatus, ctdVal3);
  end;  // _GetMeasurementsForChannel

begin
  // blah blah blah

      _GetMeasurementsForChannel(_DisplayList,
                                 eChannelNum,
                                 eExposureStatus,
                                 bActiveErrorEnabled);

  // blah blah blah
end;

it is a single-threaded app.

how would you suggest i go about finding the cause of this problem? my first thoughts were:

1) increase max stack size--i did but it didn't change anything. now it's $160000 (1441792) but before this i think it was $150000. 2) is this object still valid? seems to be...it responds to the ClassName method correctly & FastMM doesn't warn me about any problems.

interestingly, the stack trace makes no mention of the routine where the problem is caused.

:7e42b35c USER32.MoveWindow + 0xbe
:7e4565b7 USER32.GetRawInputDeviceInfoW + 0x5f
:7e428eec ; C:\WINDOWS\system32\USER32.dll
:7c90e473 ntdll.KiUserCallbackDispatcher + 0x13
ActnMenus.CallWindowHook(???,0,$31104)
:7e42b372 USER32.MoveWindow + 0xd4
:7e4565b7 USER32.GetRawInputDeviceInfoW + 0x5f
:7e428eec ; C:\WINDOWS\system32\USER32.dll
:7c90e473 ntdll.KiUserCallbackDispatcher + 0x13
:007b882d aqDockingWndProcHook + $1D
:7e42b372 USER32.MoveWindow + 0xd4
:7e4565b7 USER32.GetRawInputDeviceInfoW + 0x5f
:7e428eec ; C:\WINDOWS\system32\USER32.dll
:7c90e473 ntdll.KiUserCallbackDispatcher + 0x13
:7e428dd9 USER32.DefWindowProcW + 0xb9
:7e428d77 USER32.DefWindowProcW + 0x57
:7e418734 USER32.GetDC + 0x6d
:7e418816 ; C:\WINDOWS\system32\USER32.dll
:7e42a013 USER32.IsWindowUnicode + 0xa1
:7e42a039 USER32.CallWindowProcW + 0x1b
Controls.TWinControl.DefaultHandler(???)
:0050fac8 TWinControl.DefaultHandler + $DC
:0050b4b9 TControl.WndProc + $2D5
:0050f9cc TWinControl.WndProc + $518
:0050f0e3 TWinControl.MainWndProc + $2F
:0048874e StdWndProc + $16
:7e418734 USER32.GetDC + 0x6d
:7e418816 ; C:\WINDOWS\system32\USER32.dll
:7e428ea0 ; C:\WINDOWS\system32\USER32.dll
:7e428eec ; C:\WINDOWS\system32\USER32.dll
:7c90e473 ntdll.KiUserCallbackDispatcher + 0x13
:7e428dd9 USER32.DefWindowProcW + 0xb9
:7e428d77 USER32.DefWindowProcW + 0x57
:7e418734 USER32.GetDC + 0x6d
:7e418816 ; C:\WINDOWS\system32\USER32.dll
:7e42a013 USER32.IsWindowUnicode + 0xa1
:7e42a039 USER32.CallWindowProcW + 0x1b
:0050fac8 TWinControl.DefaultHandler + $DC
:0050f9cc TWinControl.WndProc + $518
:0050f0e3 TWinControl.MainWndProc + $2F
:0048874e StdWndProc + $16
:7e418734 USER32.GetDC + 0x6d
:7e418816 ; C:\WINDOWS\system32\USER32.dll
:7e428ea0 ; C:\WINDOWS\system32\USER32.dll
:7e428eec ; C:\WINDOWS\system32\USER32.dll
:7c90e473 ntdll.KiUserCallbackDispatcher + 0x13
:7e428dd9 USER32.DefWindowProcW + 0xb9
:7e428d77 USER32.DefWindowProcW + 0x57
:7e418734 USER32.GetDC + 0x6d
:7e418816 ; C:\WINDOWS\system32\USER32.dll
:7e42a013 USER32.IsWindowUnicode + 0xa1
:7e42a039 USER32.CallWindowProcW + 0x1b
:0050fac8 TWinControl.DefaultHandler + $DC
:0050f9cc TWinControl.WndProc + $518
:0050f0e3 TWinControl.MainWndProc + $2F
:0048874e StdWndProc + $16
:7e418734 USER32.GetDC + 0x6d
:7e418816 ; C:\WINDOWS\system32\USER32.dll
:7e428ea0 ; C:\WINDOWS\system32\USER32.dll
:7e428eec ; C:\WINDOWS\system32\USER32.dll
:7c90e473 ntdll.KiUserCallbackDispatcher + 0x13
:7e428dd9 USER32.DefWindowProcW + 0xb9
:7e428d77 USER32.DefWindowProcW + 0x57
:7e418734 USER32.GetDC + 0x6d
:7e418816 ; C:\WINDOWS\system32\USER32.dll
:7e42a013 USER32.IsWindowUnicode + 0xa1
:7e42a039 USER32.CallWindowProcW + 0x1b
:0050fac8 TWinControl.DefaultHandler + $DC
:0050f9cc TWinControl.WndProc + $518
:0065279d TcxControl.WndProc + $121
:0070b38d TcxCustomGrid.WndProc + $5
:0048874e StdWndProc + $16
:7e418734 USER32.GetDC + 0x6d
:7e418816 ; C:\WINDOWS\system32\USER32.dll
:7e428ea0 ; C:\WINDOWS\system32\USER32.dll
:7e428eec ; C:\WINDOWS\system32\USER32.dll
:7c90e473 ntdll.KiUserCallbackDispatcher + 0x13
:7e428dd9 USER32.DefWindowProcW + 0xb9
:7e428d77 USER32.DefWindowProcW + 0x57
:7e418734 USER32.GetDC + 0x6d
:7e418816 ; C:\WINDOWS\system32\USER32.dll
:7e42a013 USER32.IsWindowUnicode + 0xa1
:7e42a039 USER32.CallWindowProcW + 0x1b
:0050fac8 TWinControl.DefaultHandler + $DC
:0050f9cc TWinControl.WndProc + $518
:0065279d TcxControl.WndProc + $121
:0075bbc4 TcxGridSite.WndProc + $20
:0048874e StdWndProc + $16
:7e418734 USER32.GetDC + 0x6d
:7e418816 ; C:\WINDOWS\system32\USER32.dll
:7e428ea0 ; C:\WINDOWS\system32\USER32.dll
:7e428eec ; C:\WINDOWS\system32\USER32.dll
:7c90e473 ntdll.KiUserCallbackDispatcher + 0x13
:0044c91e HandleException + $22A
:004539af InterceptAHandleExcept + $3F
:0048874e StdWndProc + $16
:7e418734 USER32.GetDC + 0x6d
:7e418816 ; C:\WINDOWS\system32\USER32.dll
:7e4189cd ; C:\WINDOWS\system32\USER32.dll
:7e418a10 USER32.DispatchMessageW + 0xf

this suggests to me that the problem is stack overrun of some kind--bashing things used by message handling.

suggestions??? THANK YOU!

+4  A: 

I strongly suspect that the TDeviceModule reference involved is invalid. You won't always see any ill effects of calling a method on a bad object reference until some way into the method body unless the method is virtual in which case the invocation of the method itself will typically (always?) yield an AV.

Deltics
+1 That's IMHO the most likely one too, without examining in detail.
Marco van de Voort
i logged the initial pointer values for those objects when it worked and compared Self just before it executed Begin and it was the same value.
X-Ray
i also tried "exercising" the stack before making the call where the error occurs: i:=0; while i<100000 do begin asm push 00 end; inc(i); end; i:=0; while i<100000 do begin asm pop ecx end; inc(i); end;i can do that *right* *before* making the call where the error occurs without a problem! i think i'm about to learn something important...thank you all for your help!
X-Ray
I'm not quite sure what you are saying w.r.t logging initial pointer values and then comparing to "self", but *any* value of self can be a bad reference, even if that same value was previously a *good* reference. If the object referenced has been free'd but you still have a variable containing that reference it will have the same value as *before* it was free'd, but it now references unallocated memory (or worse, memory that has since been re-allocated to some other object).
Deltics
thank you Deltics--got it figured out (see my newest answer).
X-Ray
I don't think it's the case. If Self is invalid - how possible can it affect "push 0" instruction?
Alexander
@Alexander - To be honest I wasn't looking at the ASM. The Pascal code as posted didn't contain anything that pointed to potential stack fault and the fact that an AV occurred rather than an actual stack exception suggested to me that the problem might lie in the use of the reference to the instance rather than the method implementation itself. It transpires that the code posted wasn't complete, but I think/hope that my suggestion at least helped eliminate one possible explanation and thus help narrow the list of suspects and assist in the resolution.
Deltics
+2  A: 

I'd comment out each of the 3 variables, then un-comment one at a time to see if any particular one of them is blowing up. If so, you've just cut your problem by 2/3.

Chris Thornton
+3  A: 

From your comment ("error happens here") your error pops up in the loop that sets up stack space, all 212 Kb of it! It has absolutely nothing to do with the parameters you're passing to the procedure and nothing to do with the viability of the object you're passing as a parameter (there's no CALL over there, it's just an JNZ that loops to the PUSH $00 thing until the DEC ECX operation marks the ZERO flag, that is, $1a9e9 times).

Since you're dealing with a procedure that uses 212Kb of stack space maybe you should try increasing the stack space by a lot more! Even better, figure out why your procedure is using up that much space and figure out if other procedures are in the same situation (look out for large Records used as local variables).

Cosmin Prund
good point; more about this in the answer i'm adding.
X-Ray
+3  A: 

See this question: http://stackoverflow.com/questions/765162/guard-page-exceptions-in-delphi

Normally, you should get stack overflow exception, when you're running out of your stack. But if your guard page was touched by someone else and exception was eaten silently without expanding stack - then your code will crash with AV when you will expand your stack.

This isexactly what happening in your code: you expand stack and you got the AV. This assembler cycle is designed to touch stack to trigger stack expansion by guard page. Since guarg page is gone, but stack was not expanded - you got simple AV here.

Note, that increasing stack size will not help, since stack doesn't grow at all.

You need to find who plays with your stack.

Alexander
A: 

One possibility would be that the 3 local variables (stack variables) are growing larger than expected. I suppose this could happen if the objects are declared in a unit that is contained in another BPL and it's not rebuilt correctly (i.e. your program thinks it's smaller than it really is).
Whatever the reason, you can experiment and find out if that's happening. Place "buffer" variables between and after your 3 vars.

ex: 
  var 
    selChannel:TDeviceChannel; 
    Buff1 : array[1..1024] of AnsiChar;
    det:TDeviceDetector; 
    Buff2 : array[1..1024] of AnsiChar;
    shoKVMeas:TStoMeasurement; 
    Buff3 : array[1..1024] of AnsiChar;

This should do two things for you. 1) it should prevent the A/V, assuming that 1024 is enough. 2) by examining the arrays, you should be able to see if garbage appears. That would indicate that they're being overwritten by the declaration directly above.

Chris Thornton
A: 

here's what i learned:

by exercising the object, i found it was healthy.

by dumping stuff on the stack i determined it really was running out of stack space.

procedure TDeviceModule.Validate;
const
  icTestSize=400000;
var
  i:integer;
begin
  // ask the object stuff to try to see if it's healthy

  SelectChannel(dcCh1);

  ClassName;

  for eChannelNum:=low(TDeviceChannelNum) to high(TDeviceChannelNum) do
    if HasChannel(eChannelNum) then
      m_aChannels[eChannelNum].Validate;

  // exercise the stack to see if loading on extra stuff is a problem...it is

  i:=0;
  while i<icTestSize do
    begin
      asm
        push 00
      end;
      inc(i);
    end;

  i:=0;
  while i<icTestSize do
    begin
      asm
        pop ecx
      end;
      inc(i);
    end;
end;

there were a few nested functions (neither it's use nor it's declaration were part of the question because i didn't realize how much they were a part of the problem) who returned a record i'll call TBigRecord...it is 32 KB. not just that but it was used quite a few times.

procedure TDeviceModule.GetMeasurements(blah blah blah);

  function _DoSomething1(blah blah blah):TBigRecord;
  begin
  end;

  function _DoSomething2(blah blah blah):TBigRecord;
  begin
  end;

  function _DoSomething3(blah blah blah):TBigRecord;
  begin
  end;

begin
  _DoSomething1(blah blah blah);
  _DoSomething2(blah blah blah);
  _DoSomething3(blah blah blah);
end;

each time i use it (and even if i don't use the result), i get stack space allocated for the result value.

the solution i used for now was to change those functions to procedures since i wasn't using the return value anyway.

i had increased the stack space but not enough to prevent this problem.

can i expect stack overflow to be reported in such a case?

thank you all for your valuable assistance! this problem had me worried...

X-Ray
I think you see something else. First: running out of stack space means stack overflow, not AV. Second: use a tool or code to examine ESP/Current stack size and top of the stack for your thread. It's very likely that your stack is far from reaching its maximum. Third: changing function to procedure doesn't really solve anything, it can only hide something. That's because function ABC(): TSomeRecord is actualy a procedure ABC(var Result: TSomeRecord) on binary level.
Alexander
yes; i would've expected to see a Stack Overflow error instead of AV. what tool do you suggest for studying this more? i am somewhat concerned that this was a problem (even if the problem has been "solved")...can't help but wonder if i shall hear from this problem again.i wasn't using the return value so i removed it entirely.thank you for your help!
X-Ray
Take a look at VMMap or similar tools.
Alexander
A: 

Sorry to be simplistic but...

_DisplayList:TMeasDisplayListAncestor AND _DisplayList: TObjectList are both in scope simultaneously.

So are two eExposureStatus of differing types and two bActiveErrorEnabled of Boolean.

when you call _GetMeasurement(ExpInfo, _DisplayList, eChannelNum, eExposureStatus, ctdVal1) in the local procedure which variable and Type is it using? TobjectList or TTMeasDisplayListAncestor ?

Unless I'm just more drunk than I think... :)

Despatcher
i don't see TMeasDisplayListAncestor and TObjectList used for the same object anywhere.i'm ok about the two eExposureStatus of differing types and the overlapping variable names bActiveErrorEnabled._DisplayList is a descendant of TTMeasDisplayListAncestor.
X-Ray