tags:

views:

49

answers:

2

I'm trying to find proper way to handle stale data on NFS client. Consider following scenario:

  • Two servers mount same NFS shared storage with number of files
  • Client application on 1 server deletes some files
  • Client application on 2 server tries to access deleted files and fails with: Stale NFS file handle (nothing strange, error is expected)

(Also it may be useful to know, that cache mount options are pretty high on both servers for performance reasons).

What I'm trying to understand is:

  • Is there reliable method to check, that file is present? In the scenario given above lstat on the file returns success and application fails only after trying to move file.
  • How can I manually sync contents of directory on the client with server?
  • Some general advise on how to write reliable file management code in case of NFS?

Thanks.

+1  A: 

You could try the ''noac'' mount option

from man nfs:

In addition to preventing the client from caching file attributes, the noac option forces application writes to become synchronous so that local changes to a file become visible on the server immediately. That way, other clients can quickly detect recent writes when they check the file's attributes.

Using the noac option provides greater cache coherence among NFS clients accessing the same files, but it extracts a significant performance penalty. As such, judicious use of file locking is encouraged instead.

You could have two mounts, one for critical fast changing data that you need synchronized and another mount for other data.

Also, look into NFS locking and its limitations.

As for general advice:

One way to truncate a file that is concurrently read from multiple hosts is to write the content into a temporary file and then rename that file to the final location.

On the same filesystem this operation should be atomic.

miedwar
Well, that brings up a good idea: post the question to serverfault. Developers have no way to influence NFS programmatically (workarounds at best), while admins have to deal with NFS and applications running on top of it more often. Consequently the latter have more experience and might give more suggestions.
Dummy00001
+1  A: 
  • Is there reliable method to check, that file is present? In the scenario given above lstat on the file returns success and application fails only after trying to move file.

That's it normal NFS behavior.

  • How can I manually sync contents of directory on the client with server?

That is impossible to do manually, since NFS pretends to be a normal POSIX-compliant file system.

I have tried once to code close()/open() in an attempt to somehow mitigate the effects of the NFS client-side caching. In my case I needed to read the info written to the file on other server. But even the reopen trick had close to zero effect. And I can't add fdatasync() to the writing side, since that slows whole application down.

My experience with NFS to date is that nothing you can do. In critical code paths I simply coded to retry the file operations which return ESTALE.

  • Some general advise on how to write reliable file management code in case of NFS?

Mod me down all you want, but if your customers want reliability then they shouldn't use NFS.

My company for example advertises use of proper distributed file system (I intentionally omit the brand) if customer wants reliability. Our core software is not guaranteed to run on NFS and we do not support such configurations. But in our case we really need the guarantees that as soon as the data are written to FS they become accessible on all other nodes.

Coherency in NFS can be achieved, but at the cost of performance, making NFS barely usable. (Check its mount options.) NFS is caching like crazy to hide the fact that it is a server file system. To make all operations coherent, NFS client would have to go to the NFS server synchronously for every little operation, bypassing the local cache. And that would never be fast.

But since we are talking Linux here, one can advise customers of the software to evaluate available cluster file systems. E.g. RedHat now officially support GFS. I have heard about people using CodaFS, but have no hard info on it.

Dummy00001
Thanks. You've confirmed most of my own NFS research results. Guess I'll code a bunch of ESTALE checks, cause we don't have plans to migrate to some other storage. I won't accept this answer for now, hope somebody will come up with more info on topic.
begray