views:

196

answers:

3

A web service is configured to expose some of its data when receiving a USR1 signal. The signal will be sent by a xinetd server when it receives a request from a remote client, e.g. nc myserver 50666. When the web server receives USR1 signal, it opens a dedicated fifo pipe, writes its data to the pipe, and then close the pipe. In the meantime, the xinetd server reads the pipe and feeds to the remote client.

In most of times, they work nicely but occasionally, for some reason, the client will receive dup records. From the log, it seems like the pipe did not get closed properly and the cache is leftover, so when next time it serves, both previous and current are sent to the client. The problem is its not constantly happening when trying to reproduce, unluckily, I wasn't able to reproduce once.

The following are the simple snippets to demonstrate the process:

The web server: (webserver.py)

def SendStream(data, pipe):
  try:
    for i in data:
      pipe.write(i + '\n') 
      pipe.flush()
  finally:
      pipe.close()

def Serve():
  threading.Thread(target=SendStream, args=(data, pipe)).start()

The xinetd.d server: (spitter.py)

def Serve():
  if not os.path.exists(PIPE_FILE):
    os.mkfifo(PIPE_FILE)
  os.kill(server_pid, signal.SIGUSR1)
  for i in open(PIPE_FILE):
    print i,

So what exactly happened to cause the dup? How to trigger it? The current fix I unlink the pipe file and recreate it every time to avoid any leftovers but I don't know if that's a proper solution.

A: 

If you get two copies of splitter.py running at the same time, there will be trouble and almost anything that happens to you is legal. Try adding a process id value to webserver.py, ie:

pipe.write(str(os.getpid()) + i + '\n')

That might be illuminating.

There are a few problems about this solution as I can think:1. It alters the output. 2. there is no race condition under currently design. the client is a time sliced pulling machine. (every a few minutes pulls once and there is no other clients yet)
jimx
Both of those problems are irrelevant for debugging. In fact, I'd add timestamps to that, then add another timestamp on the reader end. Sanity checks are important.
Rhamphoryncus
A: 

There isn't enough to debug here. You don't show how the server handles signals, or opens the pipe.

If at all possible I would recommend not using signals. They're hairy enough in C, nevermind with python's own peculiarities added on top.

Rhamphoryncus
what would be a better solution here from your recommendation? Basically, I need an efficient IPC mechanism to share data between two processes. Thanks!
jimx
Unix domain sockets are the most flexible for purely local use. You can even pass file descriptors if you have to.Otherwise use TCP on localhost.
Rhamphoryncus
A: 

So the real problem is that there are multiple clients exist. The server has been queried/abused from other unknown clients which weren't initially being agreed with customers and sure it will break under the current design. A fix has been deployed to address the issue. So Andy's suspicion is right. Thanks guys!

jimx