views:

423

answers:

1

What are the pros and cons of using a file for interprocess communication? Let me give some background of the context I am asking this question in.

The problem is the classical producer consumer problem with some constraints. The producers are set of cooperative process running on a cluster of machines and communicate with each other using broadcasts. Each process has local users which it knows about and also lets the other processes know about them by the above broadcast mechanism. Till now the state information being broadcasted/shared was not being persisted but now it needs to be.

This system has been running on production for years now supporting thousands of users and folks are understandbly very apprehensive about adding any extra dependency to this to add the support for persistence. The path we chose was to spawn a new thread in the existing process that writes the local traffic to a file on the filesystem which is then read by a new process( lets call it the consumer) and persisted. The advantages we see with this approach are:

  1. We get persistence for free. Incase the new process has issues we are not loosing any of the local traffic as we are writing it to the file system. As long as the consumer knows where it left off, whenever it comes up it can start processing data.
  2. There is no learning curve for using queuing libraries its plain old unix file IO.
  3. The biggest pro is that we dont affect the current producer process at all, except the new thread for file writes.

Some of the concerns with this approach are:

  1. File locking and contention and its affects on performance.
  2. Making sure the write buffers are flushed and producer only releases the file lock once a full event has been written to the file. The consumer should read incomplete records.

Thoughts? Is this approach to naive and should we just pay the initial cost for the ramp up time for using an off the shelf persistent queue library? The main point here being we want to have the minimum possible impact on the current process and add no dependencies to it.

+1  A: 

I was faced with this choice recently and considered learning enough about Berkeley DB to use its queue mechanism. But ultimately I decided instead to use the Unix filesystem and write my own atomic queue primitives using Posix semaphores. If all processes are on one machine this is pretty easy. The atomic put function is about a dozen lines of code; the atomic get, because it has to wait if the queue is empty, is about three times the size.

My advice is that you design an atomic-queue API that will hide these details. (Classic example of following Parnas's advice of using an interface to hide design details that are likely to change.) You can do the first version of the API using plain Unix file I/O. Then you can try variations like locking, Berkeley DB, or semaphores---all with the "minimum impact on the current process".

You won't know performance impacts until you try something. File locking on real filesystems is pretty good; file locking on NFS is a bear.

Norman Ramsey