i profiled a portion of my application using the Delphi Sampling Profiler. Like most people, i see a majority of the time spent inside ntdll.dll
.
Note: i turned on the options to ignore
Application.Idle
time, and calls fromSystem.pas
. So it isn't insidentdll
because the application is idle:
After multiple runs, multiple times, the majority of the time seems to be spent inside ntdll.dll
, but the odd thing is who the caller is:
The caller is from the Virtual Treeview's:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
Note: The application is not inside
ntdll.dll
because the application is idle, because the caller isn'tApplication.Idle
.
What confuses me is that it's this line itself (i.e. not something inside PrepareCell) is the caller into ntdll
. Even more confusing is that:
- not only is it not something inside
PrepareCell()
- it's not even the setup of
PrepareCell
(e.g. popping stack variables, setting up implicit exception frames, etc) that is the caller. Those things would show up in the profiler as a hotspot on thebegin
inside PrepareCell.
VirtualTrees.pas:
procedure TBaseVirtualTree.PrepareCell(var PaintInfo: TVTPaintInfo; WindowOrgX, MaxWidth: Integer);
begin
...
end;
So i'm trying to figure out how this line:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
is calling ntdll.dll
.
The only other ways in are the three parameters:
PaintInfo
Window.Left
NodeBitmap.Width
Maybe one of those is a function, or a property getter, that would call into ntdll
. So i put a breakpoint on the line, and look at the CPU window at runtime:
There is a line in there that might be the culprit:
call dword ptr [edx+$2c]
But when i follow that jump, it doesn't end up in ntdll.dll
, but TBitmap.GetWidth
:
Which, as you can see, doesn't call anywhere; and certainly not into ntdll.dll
.
So how is the line:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
calling into ntdll.dll
?
Note: i know full well it isn't really calling into ntdll.dll. So any valid answer will have to include the words "Sampling Profiler is misleading; that line is not calling into ntdll.dll." The answer will also have to either say that the majority of the time is not spent in ntdll.dll, or that the highlighted line is not the caller. Finally any answer will have to explain why Sampling Profiler is wrong, and how it can be fixed.
Update 2
What is ntdll.dll? Ntdll is Windows NT's native API set. The Win32 API is a wrapper around ntdll.dll
that looks like the Windows API that existed in Windows 1/2/3/9x. In order to actually get into ntdll you have to call a function that uses ntdll directly or indirectly.
For example, when my Delphi application goes idle, it waits for a message by calling the user32.dll function:
WaitMessage;
When when you actually look at it is:
USER32.WaitMessage
mov eax,$00001226
mov edx,$7ffe0300
call dword ptr [edx]
ret
Calling the function specified at $7ffe0300
is the way Windows transitions into Ring0, calling the FunctionID specified in EAX. In this case, the System Function being called is 0x1226. On my operating system, Windows Vista, 0x1226 corresponds to the system function NtUserWaitMessage
.
This is how to you get into ntdll.dll: you call it.
i was desperately trying to avoid a hand-waving non-answer when i worded the original question. By being very specific, carefully pointing out the reality of what i'm seeing, i was trying to prevent people from ignoring the facts, and trying to use a hand-waving argument.
Update Three
i converted the two parameters:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
into stack variables:
_profiler_WindowLeft := Window.Left;
_profiler_NodeBitmapWidth := NodeBitmap.Width;
PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
To confirm that the bottleneck is not is the call to
Windows.Left
, or- Nodebitmap.Width
Profiler still indicates that the line
PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
itself is the bottleneck; not anything inside PrepareCell. This must mean that it's something inside the setup of the call to prepare cell, or at the start of PrepareCell:
VirtualTrees.pas.15746: PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
mov eax,[ebp-$54]
push eax
mov edx,esi
mov ecx,[ebp-$50]
mov eax,[ebp-$04]
call TBasevirtualTree.PrepareCell
Nothing in that calls into ntdll. Now the pre-amble in PrepareCell itself:
VirtualTrees.pas.15746: begin
push ebp
mov ebp,esp
add esp,-$44
push ebx
push esi
push edi
mov [ebp-$14],ecx
mov [ebp-$18],edx
mov [ebp-$1c],eax
lea esi,[ebp-$1c]
mov edi,[ebp-$18]
Nothing in there calls into ntdll.dll
.
The questions still remain:
- why is pushing of one variable onto the stack, and two others into registers the bottleneck?
- why isn't anything inside PrepareCell itself the bottleneck?