views:

285

answers:

2

We have a C#(2.0) application which talks to our server(in java) via web services.

Lately we have started seeing following behavior in (ONLY)one of our lab machines(XP): Once in a while(every few days), one of the webservice request will just get stuck, will not return or timeout.

Following is the stacktrace where it seem to be stuck.

Have no clue what is going on here. Any pointer would be of great help.

ESP EIP
05eceeec 7c90eb94 [GCFrame: 05eceeec] 05ecefbc 7c90eb94 [HelperMethodFrame_1OBJ: 05ecefbc] System.Threading.Monitor.Enter(System.Object) 05ecf014 7a5b0034 System.Net.ConnectionGroup.Disassociate(System.Net.Connection) 05ecf040 7a5aeaa7 System.Net.Connection.PrepareCloseConnectionSocket(System.Net.ConnectionReturnResult ByRef) 05ecf0a4 7a5ac0e1 System.Net.Connection.ReadStartNextRequest(System.Net.WebRequest, System.Net.ConnectionReturnResult ByRef) 05ecf0e8 7a5b1119 System.Net.ConnectStream.CallDone(System.Net.ConnectionReturnResult) 05ecf0fc 7a5b3b5a System.Net.ConnectStream.ReadChunkedSync(Byte[], Int32, Int32) 05ecf114 7a5b2b90 System.Net.ConnectStream.ReadWithoutValidation(Byte[], Int32, Int32, Boolean) 05ecf160 7a5b29cc System.Net.ConnectStream.Read(Byte[], Int32, Int32) 05ecf1a0 79473cab System.IO.StreamReader.ReadBuffer(Char[], Int32, Int32, Boolean ByRef) 05ecf1c4 79473bd6 System.IO.StreamReader.Read(Char[], Int32, Int32) 05ecf1e8 69c29119 System.Xml.XmlTextReaderImpl.ReadData() 05ecf1f8 69c2ad70 System.Xml.XmlTextReaderImpl.ParseDocumentContent() 05ecf20c 69c292d7 System.Xml.XmlTextReaderImpl.Read() 05ecf21c 69c2929d System.Xml.XmlTextReader.Read() 05ecf220 6991b3e7 System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(System.Web.Services.Protocols.SoapClientMessage, System.Net.WebResponse, System.IO.Stream, Boolean) 05ecf268 69919ed1 System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(System.String, System.Object[])

Edit:

After David's Answer, i saw all the threads again and found the accomplice for the deadlock:

ESP EIP
11a2f6f0 7c90eb94 [GCFrame: 11a2f6f0] 11a2f7c0 7c90eb94 [HelperMethodFrame_1OBJ: 11a2f7c0] System.Threading.Monitor.Enter(System.Object) 11a2f818 7a5ae107 System.Net.Connection.CloseOnIdle() 11a2f844 7a5b0403 System.Net.ConnectionGroup.DisableKeepAliveOnConnections() 11a2f878 7a58c035 System.Net.ServicePoint.ReleaseAllConnectionGroups() 11a2f8b4 7a58d40a System.Net.ServicePointManager.IdleServicePointTimeoutCallback(Timer, Int32, System.Object) 11a2f8e8 7a5d2f40 System.Net.TimerThread+TimerNode.Fire() 11a2f928 7a5d2bb2 System.Net.TimerThread+TimerQueue.Fire(Int32 ByRef) 11a2f968 7a5d2540 System.Net.TimerThread.ThreadProc() 11a2f9b4 793d7a7b System.Threading.ThreadHelper.ThreadStart_Context(System.Object) 11a2f9bc 793683dd System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) 11a2f9d4 793d7b5c System.Threading.ThreadHelper.ThreadStart() 11a2fbf8 79e88f63 [GCFrame: 11a2fbf8]

So, do we know if it is fixed in 4.0?

+1  A: 

This looks related... bug in CLR? Microsoft Connect

Edit: the code in the 4.0 framework looks like locks are handled totally differently. It may have fixed the bug.

David Crowell
Thanks David for pointing that out!!
blue
+1  A: 

You almost certainly have a deadlock. Suppose you have thread A which takes out the lock on object 2 and then waits for object 1 to be unlocked. Thread A has to wait because thread B has taken out the lock on object 1 before it unlocks object 2. Now thread A and thread B will both wait forever because each is waiting on the other to unlock something.

Use the debugger to look at every thread in your program and see which two threads are both sitting there waiting for a lock. Then figure out which locks they are waiting for. Then figure out how to rewrite your program so that those two locks are never taken out in an order which is inconsistent on two threads.

Remember, writing correct locking code requires global knowledge of all locks in the program, all operations on all threads that could possibly take them out in every possible ordering. That's why it's so hard to get it right; most programming tasks require only local knowledge. Locks require global knowledge of the entire program, including the parts that you didn't write. If some third party dll is taking out a lock on an object and your code is waiting on the same object, you have to agree with that third party code on what the correct lock ordering choice is.

That's the most likely cause. There is an unlikely but possible other cause, which is that sometimes you abort a thread between when the lock is taken out and the unlocking finally block is executed. That will cause a deadlock in C# 3 and below; we've fixed the code generator in C# 4 so that this no longer happens. The moral of the story here though is not "use C# 4", it's "don't abort a thread, ever, and particularly don't abort a thread that might be locking on something". Thread aborts should be only used as a last resort, when you are taking down the process anyway.

Eric Lippert
Eric, if you follow the stack trace, he's not taking a lock. .NET CLR code is doing it.I used reflector against the 2.0 and 4.0 assemblies, and the locking strategy is quite different. I'm assuming this was bug fix for a possible 2.0 bug. I couldn't find a **official** notice of the bug though.
David Crowell