views:

369

answers:

8

I am debugging an application which slows down the system very badly. The application loads a large amount of data (some 1000 files each of half an MB) from the local hard disk.The files are loaded as memory mapped files and are mapped only when needed. This means that at any given point in time the virtual memory usage does not exceed 300 MB.

I also checked the Handle count using handle.exe from sysinternals and found that there are at the most some 8000 odd handles opened. When the data is unloaded it drops to around 400. There are no handle leaks after each load and unload operation.

After 2-3 Load unload cycles, during one load, the system becomes very slow. I checked the virtual memory usage of the application as well as the handle counts at this point and it was well within the limits (VM about 460MB not much fragmentation also, handle counts 3200).

I want how an application could make the system very slow to respond? What other tools can I use to debug this scenario?

Let me be more specific, when i mean system it is entire windows that is slowing down. Task manager itself takes 2 mins to come up and most often requires a hard reboot

A: 

You use tools like "IBM Rational Quantify" or "Intel VTune" to detect performance issue.
[EDIT]
Like Benoît did, one good mean is measuring tasks time to identify which is eating cpu.
But remember, as you are working with many files, is likely to be missing that causes the memory to disk swap.

lsalamon
A: 

If you don't have profilers, you may have to do the same work by hand...

Have you tried commenting out all read/write operations, just to check whether the slow down disappears ? "Divide and conquer" strategies will help you find where the problem lies.

Benoît
A: 

If you run it under an IDE, run it until it gets real slow, then hit the "pause" button. You will catch it in the act of doing whatever takes so much time.

Mike Dunlavey
A: 

when task manager is taking 2 minutes to come up, are you getting a lot of disk activity? or is it cpu-bound?

I would try process explorer from sysinternals. When your system is in the slowed-down state, and you try running, say, notepad, pay attention to page fault deltas.

Ron
+2  A: 

An application by itself with default priority cannot make a system slow/unresponsive. However kernel can and an application can't do much without calling the kernel. Kernel has many mechanisms that take precedence over regular running threads. Those are:

  • Hardware interrupts. These have the highest priority. Say if you have a device with "interrupt storm" problem the system will become unresponsive.
  • DPCs. These come second in the list. DPCs are basically deferred interrupt processing routines. When a DPC is running multitasking is esentially disabled (because scheduler itself is a DPC routine). If you have a buggy/badly designed device driver queuing too many DPCs the system may become unresponsive.
  • APCs. These are like DPCs but run in a certain thread context. They are mostly used to copy I/O results to process buffer. Passive level code cannot run before APCs are completed.
  • There is Windows Cache Manager's "Delayed Writer" which works on a separate higher priority thread to write changes in a file to disk, acting as a lazy write cache. This thread takes precedence over regular threads and long running flush operations can hinder system responsiveness.
  • File I/O on pagefile volume causes paging operations to be delayed for multiple reasons. Even if it's on a different partition you'd have the "seek penalty". This can slow down anything on the system. Try the same app on a different disk to see the difference.

All these mechanisms are involved in a single file I/O. So depending on the storage device driver you use, filter drivers you have installed on the system, you might end up with an unresponsive/sluggish system. This is usually not the case with good/stable device drivers.

Let's say your I/O is running on a different disk than the pagefile. Then I would recommend starting with Perfmon to understand what's going on. Drill down to sub categories (interrupts, DPCs, APCs, memory manager operations, threads) to understand what's taking the most time. Process Explorer also helps to see DPC and Interrupt execution times. If it's anything around interrupts, DPCs, you might have a buggy driver/hardware. Otherwise there may be a filter driver interfering with performance/responsiveness.

ssg
+1  A: 

Tools you can use at this point:

  • Perfmon
  • Event Viewer

In my experience when things happen to a system that prevent Task Manager from popping up, theyre usually of the hardware variety - checking the system event log of Event Viewer sometimes is just full of warnings or errors that some hardware device is timing out.

If Event Viewer doesnt indicate that any kind of loggable hardware error is causing the slowdown, then Perfmon - add counters for system objects to track file read, exceptions, context switches etc per second and see if theres something obvious there.

Frankly the sort of behaviour demonstrated is meant to be impossible - by design - for user mode code to cause. Win NT goes to a lot of effort to insulate applications from each other and prevent rogue applications from making the system unusable. So my suspicion is some kind of hardware fault is to blame. Is there any chance you can simply run the same test on a different PC?

Chris Becke
A: 

The fact that the whole system slows downs is very annoying, it means you can not attach a profiler easily, it also means it would be even difficult to stop the profiling session in order to view the results ( since you said it require a hard reboot ).

The best tool suited for the job in this situation is ETW ( Event Tracing for Windows ), these tools are great, will give you the exact answer you are looking for

Check them out here

http://msdn.microsoft.com/en-us/library/cc305210.aspx and http://msdn.microsoft.com/en-us/library/cc305221.aspx and http://msdn.microsoft.com/en-us/performance/default.aspx

Hope this works. Thanks

mfawzymkh
A: 

Windows is very greedy about caching file data. I would try removing file I/O as someone suggested, and also making sure you close the file mapping as soon as you are done with a file. I/O is probably causing your slowdown,especially if your files are on the same disk as the OS. Another way to test that would be to move your files to another disk and see if that alleviates the problem.

a_mole