views:

238

answers:

2

I have strange a memory corruption problem. After many hours debugging and trying I think I found something.

For example: I do a simple string assignment:

sTest := 'SET LOCK_TIMEOUT ';

However, the result sometimes becomes:

sTest = 'SET LOCK'#0'TIMEOUT '

So, the _ gets replaced by an 0 byte.

I have seen this happening once (reproducing is tricky, dependent on timing) in the System.Move function, when it uses the FPU stack (fild, fistp) for fast memory copy (in case of 9 till 32 bytes to move):

...
@@SmallMove: {9..32 Byte Move}
fild    qword ptr [eax+ecx] {Load Last 8}
fild    qword ptr [eax] {Load First 8}
cmp     ecx, 8
jle     @@Small16
fild    qword ptr [eax+8] {Load Second 8}
cmp     ecx, 16
jle     @@Small24
fild    qword ptr [eax+16] {Load Third 8}
fistp   qword ptr [edx+16] {Save Third 8}
...

Using the FPU view and 2 memory debug views (Delphi -> View -> Debug -> CPU -> Memory) I saw it going wrong... once... could not reproduce however...

This morning I read something about the 8087CW mode, and yes, if this is changed into $27F I get memory corruption! Normally it is $133F:

The difference between $133F and $027F is that $027F sets up the FPU for doing less precise calculations (limiting to Double in stead of Extended) and different infiniti handling (which was used for older FPU’s, but is not used any more).

Okay, now I found why but not when!

I changed the working of my AsmProfiler with a simple check (so all functions are checked at enter and leave):

if Get8087CW = $27F then    //normally $1372?
  if MainThreadID = GetCurrentThreadId then  //only check mainthread
    DebugBreak;

I "profiled" some units and dll's and bingo (see stack):

Windows.StretchBlt(3372289943,0,0,514,345,4211154027,0,0,514,345,13369376)
pngimage.TPNGObject.DrawPartialTrans(4211154027,(0, 0, 514, 345, (0, 0), (514, 345)))
pngimage.TPNGObject.Draw($7FF62450,(0, 0, 514, 345, (0, 0), (514, 345)))
Graphics.TCanvas.StretchDraw((0, 0, 514, 345, (0, 0), (514, 345)),$7FECF3D0)
ExtCtrls.TImage.Paint
Controls.TGraphicControl.WMPaint((15, 4211154027, 0, 0))

So it is happening in StretchBlt...

What to do now? Is it a fault of Windows, or a bug in PNG (included in D2007)? Or is the System.Move function not failsafe?

Note: simply trying to reproduce does not work:

  Set8087CW($27F);
  sSQL := 'SET LOCK_TIMEOUT ';

It seems to be more exotic... But by debugbreak on 'Get8087CW = $27F' I could reproduce it on an other string: FPU part 1: FPU part 1 FPU part 2: FPU part 2 FPU part 3: FPU part 3 FPU Final: corrupt!: FPU Final: corrupt!

Note 2: Maybe the FPU stack must be cleared in the System.Move?

+1  A: 

It might be a bug in your video driver that does not preserve the 8087 control word when it performs the StretchBlt operation.
In the past I have seen similar behaviour when using certain printer drivers. They think they own the 8087 CW and are wrong...

Note the default value of the 8087 CW in Delphi seems $1372; for a more detailed explanation of the CW values, see this article: it also explains a situation that Michael Justin described when his 8087CW got hosed.

--jeroen

Jeroen Pluimers
+4  A: 

I haven't seen this particular issue, but Move can definitely get messed up if the FPU is in a bad state. Cisco's VPN driver can screw things up horribly, even if you're not doing anything network related.

http://brianorr.blogspot.com/2006/11/intel-pentium-d-floating-point-unit.html

http://www.dankohn.com/archives/343

http://blog.excastle.com/2007/08/28/delphi-bug-of-the-day-fpu-stack-leak/ (comments by Ritchie Annand)

In our case we detect the buggy VPN driver and swap out Move and FillChar with the Delphi 7 versions, replace IntToStr with a Pascal version (Int64-version uses the FPU), and, since we're using FastMM, we disable its custom fixed size move routines too, since they're even more susceptible than System.Move.

Craig Peterson
Thanks, I was already wondering if there could be more situations like this, so yes... :-(But this gives me bad feelings about FPU in general: if you have very important scientific or business calculations, how can you 100% be sure the results are right? I mean: if you execute or use an external function, you have an extra risk. Now it is PNG or videocard which can corrupt the FPU, but also Cisco or audio driver can do it...
André
@André with the current state of affairs the only way to reasonably assure they are right is by using a plain Windows install (no 3rd party drivers), and only your software.
Jeroen Pluimers