views:

161

answers:

1

There's a bit of code which writes data to a MemoryStream object directly into it's data buffer by calling GetBuffer(). It also uses and updates the Position and SetLength() properties appropriately.

This code works properly 99.9999% of the time. Literally. Only every so many 100,000's of iterations it will barf. The specific problem is that the Position property of MemoryStream suddenly returns zero instead of the appropriate value.

However, code was added that checks for the 0 and throws an exception which includes log of the MemoryStream properties like Position and Length in a separate method. Those return the correct value. Further addition of logging within the same method shows that when this rare condition occurs, the Position only has zero inside this particular method.

Okay. Obviously, this must be a threading issue. And most likely a compiler optimization issue.

However, the nature of this software is that it's organized by "tasks" with a scheduler and so any one of several actual O/S thread may run this code at any give time--but never more than one at a time.

So it's my guess that ordinarily it so happens that the same thread keeps getting used for this method and then on a rare occasion a different thread get used. (Just code the idea to test this theory by capturing and comparing the thread id.)

Then due to compiler optimizations, the different thread never gets the correct value. It gets a "stale" value.

Ordinarily in a situation like this I would apply a "volatile" keyword to the variable in question to see if that fixes it. But in this case the variables are inside the MemoryStream object.

Does anyone have any other idea? Or does this mean we have to implement our own MemoryStream object?

Sincerely, Wayne

EDIT: Just ran a test which counts the total number of calls to this method and counts the number of times the ManagedThreadId is different than the last call. It's almost exactly 50% of the time that it switches threads--alternating between them. So my theory above is almost certainly wrong or the error would occur far more often.

EDIT: This bug occurs so rarely that it would take nearly a week to run without the bug before feeling any confidence it's really gone. Instead, it's better to run experiments to confirm precisely the nature of the problem.

EDIT: Locking currently is handled via lock() statements in each of 5 methods that use the MemoryStream.

+1  A: 
Richard
Thanks so much for trying to help! See my edits above. Also, I have an experiment currently which uses an interlocked.increment and decrement of a "countThreads" variable which throws an exception if countThreads > 1. this logic wraps the Property named Data which is how other objects access the MemoryStream.That never gets tripped which is why it appears that only one thread is involved. However, methods within the class only use the "data" class field directly. I'm modifying them all to also use the Data property to see if 2 threads every access it simultaneously.
Wayne
The above comment proved that you were correct. The Data property was getting accessed from two different thread. I suggest this technique for discovery. Actually write code to catch when 2 threads call and get the stack trace from each one. That's what I did. Then it turned out to be due to a separate thread kicked off to clean out remaining bytes from a tcp/ip socket. That's why this was so rare. It only had the chance of occurring during a socket closing procedure.
Wayne
Sorry. Thought it was fixed but it recurred. That issue was only a part of the problem. The MemoryStream still gets a Zero Position sometimes even though it's not impossible for any thread conflict. I have overridden all the methods and added a volatile position and length variables which is the only remaining issue because since the MSDN docs say that MemoryStream isn't thread safe, they probably means that it also has some thread affinity due to compiler optimizations when compiling in Release mode. I'll post after a few days if the problem is solved for the benefit of others.
Wayne
In the end it was definitely a threading problem as you describe. So you get credit for the right answer.
Wayne