views:

1319

answers:

3

I am trying to solve a persistent IO problem when we try to read or write to a Windows 2003 Clustered Fileshare. It is happening regularly and seem to be triggered by traffic. We are writing via .NET's FileStream object.

Basically we are writing from a Windows 2003 Server running IIS to a Windows 2003 file share cluster. When writing to the file share, the IIS server often gets two errors. One is an Application Popup from Windows, the other is a warning from MRxSmb. Both say the same thing:

[Delayed Write Failed] Windows was unable to save all the data for the file \Device\LanmanRedirector. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elswhere.

On reads, we are also getting errors, which are System.IO.IOException errors: "The specified network name is no longer available."

We have other servers writing more and larger files to this File Share Cluster without an issue. It's only coming from the one group of servers that the issue comes up. So it doesn't seem related to writing large files. We've applied all the hotfixes referenced in articles online dealing with this issue, and yet it continues.

Our network team ran Network Monitor and didn't see any packet loss, from what I understand, but as I wasn't present for that test I can't say that for certain.

Any ideas of where to check? I'm out of avenues to explore or tests to run. I'm guessing the issue is some kind of network problem, but as it's only happening when these servers connect to that File Share cluster, I'm not sure what kind of problem it might be.

This issue is awfully specific, and potentially hardware related, but any help you can give would be of assistance.

Eric Sipple

+1  A: 

I've heard of AutoDisconnect causing similar issues (even if the device isn't idle). You may want to try disabling that on the server.

Mark Brackett
A: 

I've seen other people reporting the "delayed write failed" error. One recommendation was to adjust the size of the cache, there's a utility from sysinternals (http://technet.microsoft.com/en-us/sysinternals/bb897561.aspx) that will allow you to do that.

chris
A: 

I am having similar problems:

  • writing to a machine that is also part of a Windows 2003 R2 NLB cluster sometimes results in "Delayed Write Failed" or "the semaphore has timed out" or "the specified network name is no longer available"
  • this is reproducible for the same files, even after rebooting all machines involved
  • if I rename the problem-files (some of which are quite small), the problem remains
  • if I write the files to another location (fysical disk) on the same machine, the problem remains
  • I uninstalled all anti-virus software, problem remains
  • I have reset the tcp-ip stack, problem temporarily disappears, but after some time the problem returns for the same files

PARTLY SOLVED the problem: I deleted (not stopped) the host from the NLB cluster. Problem solved.

Seems to have to do something with writing to a share on a server that is also part of a network load balancing cluster

I have not yet found other people posting NLB cluster related file write problems. However, I did find many posts complaining about similar problems, none of which seem to have been solved.

Anne

Anne