views:

343

answers:

2

Core of the problem: I receive "(0x80070002) The system cannot find the file specified" for roughly 8 to 9 seconds before it can open it successfully.

In a nutshell, we have two com components. Component A calls into Component B and asks for a UNC filename to write to - the filename returned doesn't exist yet, but the path does - it then does its work, creates and populates the file, and tells Component B its done by making another com call. At that point Component B will call MoveFile to rename the file to its "official" name.

This code has worked for (literally) years. Its works fine on win2k3. Its works fine when its running on win2k8 and points to a share on a win2k3 server. But if you run it on win2k8 and point it to a share on a (different) win2k8 server it fails. It also runs fine, if there share is actually located on the same win2k8 machine that the code is running on.

Both Component A & component B exist in their own Windows Service running with as a domain admin account. The shares are configured for "Everyone/Full control" in all test environments, similarly so are the underlying folders that the share points to. All machines are in the same domain.

During debugging i realized the file actually does exist by the time i get to checking manually for it - after several iterations it occurred to me that the file doesn't "show up" until some delay passes by - so i put in the loop below in component B as shown below:

int nCounter = 0;

while (true) { CFileStatus fs; if (CFile::GetStatus(tempname, fs)) break;

SleepEx(100, FALSE); nCounter++; }

This code does, in fact, exit and nCounter is generally between 80 & 90 iterations when it does indicating the file "appears" approx 8 to 9 seconds later. Once that loop exits the code can successfully rename the file and all further processing appears to work.

I put a CFile::GetStatus in component A immediately before it calls into Component B and that indicates success - it can see the file and get its true size yet the call into component b made immediately after can not see the file until the above indicated delay passes. I have verified the pathnames are precisely the same, even though it would clearly have to be for the calls to eventually succeed after a pause of 8 to 9 seconds...

When something like this occurs I always assume there is a bug in my code until proven otherwise, but given this code has executed properly for a very long time and (other than my diagnostic loop added) has not changed, and it works in all environments except the win2k8 - > win2k8 share i'm guessing there is some OS issue in here that i do not understand.

Any insight would be helpful - thanks

+1  A: 

The problem turns out to be SMB2 protocol caching changes - SMB2 is only used when both sides agree which is why it is only a problem for win2k8->win2k8 (vista & windows 7 also implement SMB2)

In SMB2 the client has a local cache that is only updated (by default) every 10 seconds, this means other machines can update the server share and it will not be seen until the cache becomes invalidated. I think, but have not confirmed with Microsoft at this point, that there is also some issue in the WindowsOnWindows layer in win2k8 when running 32bit applications - it would appear that they may get their own cache as well which might explain why my two interacting applications on the same machine didn't see the same view of the sever's share.

There a (at least) a couple work-arounds to this problem as provided by microsoft

  • Rewrite your code to use unbuffered file access
  • call FlushFileBuffers when you are done writing (but before you close the file)
  • Disable SMB2 on one or both machines

Run "regedit" on Windows Server 2008
based computer. Expand and locate the sub tree as follows.
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters Add a new REG_DWORD key with the name of "Smb2" (without quotation mark)
Value name: Smb2 Value type:
REG_DWORD 0 = disabled 1 = enabled
Set the value to 0 to disable SMB 2.0, or set it to 1 to re-enable SMB 2.0. Reboot the server.

  • Disable the SMB2 caching on the client machine

Configure the below keys with the desired timeout value for refreshing the local cache. Setting these keys to zero will disable the respective cache.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters:

FileInfoCacheLifetime FileNotFoundCacheLifetime DirectoryCacheLifetime All values are DWORD, specified in seconds

SMB2 Registry settings

Ruddy
A: 

Is this problem with SMB2.0 likely to cause other effects?

I have well established code which creates sequential files, then immediately writes to them.

On a Windows 7 PC this is not working properly, with the last few characters not writing to disc. The code has fflush() before fclose() as belt and braces, and opens files in commit mode.

The problem only occurs where there is a Windows 7 PC and 2008 server, and only when the file is written to a local Windows 7 drive - whether fixed or removeable - hence the SMB2.0 connection?

John Gold