I am trying to solve a persistent IO problem when we try to read or write to a Windows 2003 Clustered Fileshare. It is happening regularly and seem to be triggered by traffic. We are writing via .NET's FileStream object.
Basically we are writing from a Windows 2003 Server running IIS to a Windows 2003 file share cluster. When writing to the file share, the IIS server often gets two errors. One is an Application Popup from Windows, the other is a warning from MRxSmb. Both say the same thing:
[Delayed Write Failed] Windows was unable to save all the data for the file \Device\LanmanRedirector. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elswhere.
On reads, we are also getting errors, which are System.IO.IOException errors: "The specified network name is no longer available."
We have other servers writing more and larger files to this File Share Cluster without an issue. It's only coming from the one group of servers that the issue comes up. So it doesn't seem related to writing large files. We've applied all the hotfixes referenced in articles online dealing with this issue, and yet it continues.
Our network team ran Network Monitor and didn't see any packet loss, from what I understand, but as I wasn't present for that test I can't say that for certain.
Any ideas of where to check? I'm out of avenues to explore or tests to run. I'm guessing the issue is some kind of network problem, but as it's only happening when these servers connect to that File Share cluster, I'm not sure what kind of problem it might be.
This issue is awfully specific, and potentially hardware related, but any help you can give would be of assistance.
Eric Sipple