views:

948

answers:

5

The question proper

Has anyone experienced this exception on a single core machine?

The I/O operation has been aborted because of either a thread exit or an application request.

Some context

On a single CPU system, only one MSIL instruction is executed at a time, threads notwithstanding. Between operations, the runtime gets to do its housekeeping.

Introduce a second CPU (or a second core) and it becomes possible to have an operation execute while the runtime does housekeeping. As a result, code that works perfectly on a single CPU machine may crash - or even induce a bluescreen - when executed in a multcore environment.

Interestingly, HyperThreaded Pentiums do not manifest the problem.

I had sample code that worked perfectly on a single core and flaked on a multicore CPU. It's around somewhere but I'm still trying to find it. The gist of it was that when it was implemented as Visitor pattern, it would flake after an unpredictable number of iterations, but moving the method into the object on which the visitor had operated made the problem disappear.

To me this suggests that the framework has some kind of internal hash table for resolving object references, and on a multicore system a race condition exists with respect to accessing this.

I also currently have code using APM to process serial comms. It used to intermittently bluescreen inside the virtual comport driver for my USB serial adaptor, but I fixed this by doing a Thread.Sleep(0) after every Stream.EndRead(IAsyncResult)

At random intervals, when the AsyncCallback I supply to Stream.BeginRead(...) is invoked and the handler tries to invoke Stream.EndRead(IAsyncResult), it throws an IOException stating that The I/O operation has been aborted because of either a thread exit or an application request.

I suspect that this too is multicore related and that some sort of internal error is killing the wait thread, leading to this behaviour. If I am right about this then the framework has serious flaws in the context of a multicore environment. While there are workarounds such as I have mentioned, you can't always apply them because sometimes they need to be applied inside other framework code.

For example, if you search the net regarding the above IOException you will find it affecting code written by people who clearly don't even know they are using multiple threads because it happens under the covers of framework convenience wrappers.

Microsoft tends to blow off these bug reports as unreproduceable. I suspect this is because the problem only occurs on multicore systems and bug reports like this one don't mention the number of CPUs.

So... please help me pin down the problem. If I'm right about this I'm going to have to be able to prove it with repeatable test cases, because what I think is wrong is going to entail bugfixes in both framework and runtime.


It has been suggested that the problem is is more likely to be my code than the framework.

Investigating variant A of the issue, I have transplanted the problem code into a sample app and pared it down until the only things left were thread setup and method invocations that worked on one CPU and failed on two.

Variant B I have not so tested, because I no longer have any single core systems. So I repeat the question: has anyone seen this exception on a single core platform?

Unfortunately no-one can confirm my suspicion, only refute it.

It is not helpful to tell me that I am fallible, I am already aware of this.

If you know of a way to pin a .NET application to a single CPU it would be very handy for figuring this out. ---Thanks for the VM suggestion. I will do exactly that, good call.

+2  A: 

Based on your description, my inclination would be to blame the COM port driver. Was the driver for it developed prior to the multicore era? I once had a similar issue with such a device which a later driver revision thankfully fixed.

Addition: To answer your question on how to limit your app to a single CPU, you will need to set the process affinity to a single CPU. See this link. You can also do this after your process has started using task manager (right click on process in task manager and select "Set Affinity...")

zdan
It's an X64 driver for Vista. Definitely post-multicore :)
Peter Wone
I think a clue might be in "64-bit driver for vista" ! ;)
Mitch Wheat
+2  A: 

Blue screens aren't due solely to bugs in applications or frameworks. Blue screens need "help" from kernel mode. One of your problems is a defective driver, no matter which "era" the defective driver was coded in.

Regarding the possibility of one thread closing the port while another thread is still using it, I think this could be related to some famous bugs in framework housekeeping. I think those bugs don't depend on the number of cores, but the frequency of getting hit by those bugs could increase when there are more cores. Try adding a GC.KeepAlive call to prevent the framework from deleting your port too early.

Windows programmer
Can you provide links to the aforementioned famous bugs? I'm interested :-)
Orion Edwards
Sorry, the only way I can find them now is by the same kind of Google search you can do. It seems a lot of people were affected by the .Net Framework collecting objects too aggressively, and KeepAlive is needed even when it "shouldn't" be needed.
Windows programmer
+1  A: 

Prior to Vista any async IO that was in progress when the thread that issued it terminates is terminated. This tends to give the error that you report, i.e.

The I/O operation has been aborted because of either a thread exit or an application request.

I'm not sure if this is in any way relevant to your question, but are you issuing asynchronous operations from a thread which can terminate before the operations have completed?

Len Holgate
No, I'm issuing asynch ops on a thread that terminates unexpectedly, triggering this exception and terminating the application as a result.
Peter Wone
Peter, if the thread that issues the async ops terminates then I think that getting this exception is expected. Why does the thread that issues the operations terminate ?
Len Holgate
+1  A: 

I am totally at loss of words here. You are telling that your code is breaking on dual core machines and you are suspecting MS for that!!!

Now a days every machine out there has got dual or even quad cores. If .net framework had any major issue working with dual cores then why live messenger, live writer and many other .net thick applications are not breaking frequently. I believe SQL Server 2K5 and 2K8 management studios are also in .net. Entire System.Web implementation is in C# itself. Entire Biztalk orchestration designer is in .net

Now coming to point. Your application seems to have multithreading and lots of async calls going up and down. Do you have flexibility to configure no. of threads in your application? If yes, can you limit the threads to 1 and then test it. Errors due to multithreading are very difficult to trace.

Have you tried SOS? Try doing that... I don't know it much but Google for it and you will certainly get good resources on usages of SOS.

As a final resort, open a case with MS support. You need to be little patient with them because at first they will start with all silly questions :). Good luck.

Pradeep
+2  A: 

I'm currently in the process of rewriting the whole file transfer stack that is used in our application. From conversations with other workers I know that it was kind of working couple of years ago, when single core laptops and slow-speed connections were used in production. Now everyone moved to dual cores and hispeed internet, and the whole software shows unpredictable results.

So, when I started learning the code more, I found that the person who developed it, had not a single idea of how to properly write multithreading code. All "synchronization" is done using Thread.Sleep()! Thread management was done on "fire and forget" basis. Someone wants to stop the thread? Thread.Abort()! Dammit! That's a surprise the damn thing was working at all.

My point is -- go and check your code, and if you're working with some custom hardware, their drivers' code. The problem is there, not in .NET, Win32 or somewhere else.