views:

1067

answers:

1

Hello!

Consider:

pipe_read, pipe_write = os.pipe()

Now, I would like to know two things:

(1) I have two threads. If I guarantee that only one is reading os.read(pipe_read,n) and the other is only writing os.write(pipe_write), will I have any problem, even if the two threads do it simultaneously? Will I get all data that was written in the correct order? What happens if they do it simultaneously? Is it possible that a single write is read in pieces, like?:

Thread 1: os.write(pipe_write, '1234567')
Thread 2: os.read(pipe_read,big_number) --> '123'
Thread 2: os.read(pipe_read,big_number) --> '4567'

Or -- again, consider simultaneity -- will a single os.write(some_string) always return entirely by a single os.read(pipe_read, very_big_number)?

(2) Consider more than one thread writing to the pipe_write end of the pipe using logging.handlers.FileHandler() -- I've read that the logging module is threadsafe. Does this mean that I can do this without losing data? I think I won't be able to control the order of the data in the pipe; but this is not a requirement. Requirements:

  • all data written by some threads on the write end must come out at the read end
  • a string written by a single logger.info(), logger.error(), ... has to stay in one piece.

Are these reqs fulfilled?

Thank you in advance,

Jan-Philip Gehrcke

+5  A: 

os.read and os.write on the two fds returned from os.pipe is threadsafe, but you appear to demand more than that. Sub (1), yes, there is no "atomicity" guarantee for sinle reads or writes -- the scenario you depict (a single short write ends up producing two reads) is entirely possible. (In general, os.whatever is a thin wrapper on operating system functionality, and it's up to the OS to ensure, or fail to ensure, the kind of functionality you require; in this case, the Posix standard doesn't require the OS to ensure this kind of "atomicity"). You're guaranteed to get all data that was written, and in the correct order, but that's it. A single write of a large piece of data might stall once it's filled the OS-supplied buffer and only proceed once some other thread has read some of the initial data (beware deadlocks, of course!), etc, etc.

Sub (2), yes, the logging module is threadsafe AND "atomic" in that data produced by a single call to logging.info, logging.warn, logging.error, etc, "stays in one piece" in terms of calls to the underlying handler (however if that handler in turn uses non-atomic means such as os.write, it may still e.g. stall in the kernel until the underlying buffer gets unclogged, etc, etc, as above).

Alex Martelli
Thank you very much for this answer. "You're guaranteed to get all data that was written, and in the correct order, but that's it" -> That's all I need, because this means that I can re-assemble chopped data by a small post-os.read()ing routine.Now I can start coding with a clear conscience. Thank you, again :-)
Jan-Philip Gehrcke
You're welcome! You could use a distinguishable marker at the start of each 'write' to make the reassembling post-processing easier.
Alex Martelli
Some update: Actually, I've to use logging's `StreamHandler()` instead of `FileHandler()`. From Python docs: The `StreamHandler()` "sends logging output to [...] any file-like object (or, more precisely, any object which supports `write()` and `flush()` methods). Hence, I've to wrap `pipe_write` with `os.fdopen()`, to get the write- and flush methods: `logging.StreamHandler(os.fdopen(pipe_write,'a',0))`I hope I chose good options with no buffering and append mode.
Jan-Philip Gehrcke
no-buffering is definitely as close as you can get to a bare os.write; I believe 'a' vs 'w' makes no difference in this context.
Alex Martelli
It works perfectly for me, now. I wrote a blog post about the need and the realization of the post-processing step: http://gehrcke.de/2009/08/counting-stuff-in-python-histogram-thread-pipe-communication/I rely on the trailing newline of written strings. This is convenient, because the `StreamHandler` automatically adds newlines to each log string!
Jan-Philip Gehrcke