tags:

views:

776

answers:

3

I am interested in writing separate program modules that run as independent threads that I could hook together with pipes. The motivation would be that I could write and test each module completely independently, perhaps even write them in different languages, or run the different modules on different machines. There are a wide variety of possibilities here. I have used piping for a while, but I am unfamiliar with the nuances of its behaviour.

  • It seems like the receiving end will block waiting for input, which I would expect, but will the sending end block sometimes waiting for someone to read from the stream?
  • If I write an eof to the stream can I keep continue writing to that stream until I close it?
  • Are there differences in the behaviour named and unnamed pipes?
  • Does it matter which end of the pipe I open first with named pipes?
  • Is the behaviour of pipes consistent between different Linux systems?
  • Does the behaviour of the pipes depend on the shell I'm using or the way I've configured it?
  • Are there any other questions I should be asking or issues I should be aware of if I want to use pipes in this way?
+5  A: 

Wow, that's a lot of questions. Let's see if I can cover everything...

It seems like the receiving end will block waiting for input, which I would expect

You expect correctly an actual 'read' call will block until something is there. However, I believe there are some C functions that will allow you to 'peek' at what (and how much) is waiting in the pipe. Unfortunately, I don't remember if this blocks as well.

will the sending end block sometimes waiting for someone to read from the stream

No, sending should never block. Think of the ramifications if this were a pipe across the network to another computer. Would you want to wait (through possibly high latency) for the other computer to respond that it received it? Now this is a different case if the reader handle of the destination has been closed. In this case, you should have some error checking to handle that.

If I write an eof to the stream can I keep continue writing to that stream until I close it

I would think this depends on what language you're using and its implementation of pipes. In C, I'd say no. In a linux shell, I'd say yes. Someone else with more experience would have to answer that.

Are there differences in the behaviour named and unnamed pipes? As far as I know, yes. However, I don't have much experience with named vs unnamed. I believe the difference is:

  • Single direction vs Bidirectional communication
  • Reading AND writing to the "in" and "out" streams of a thread

Does it matter which end of the pipe I open first with named pipes?

Generally no, but you could run into problems on initialization trying to create and link the threads with each other. You'd need to have one main thread that creates all the sub-threads and syncs their respective pipes with each other.

Is the behaviour of pipes consistent between different linux systems?

Again, this depends on what language, but generally yes. Ever heard of POSIX? That's the standard (at least for linux, Windows does it's own thing).

Does the behaviour of the pipes depend on the shell I'm using or the way I've configured it?

This is getting into a little more of a gray area. The answer should be no since the shell should essentially be making system calls. However, everything up until that point is up for grabs.

Are there any other questions I should be asking

The questions you've asked shows that you have a decent understanding of the system. Keep researching and focus on what level you're going to be working on (shell, C, so on). You'll learn a lot more by just trying it though.

Dashogun
Peeking at the contents of a pipe with stat() is not reliable across all platforms.
Jonathan Leffler
The writing end can block if the pipe buffer fills - it is not very large.
Jonathan Leffler
Cross-machine pipes are ... non-existent? The nearest approach is probably a socket, but that isn't the same as a pipe.
Jonathan Leffler
Both named and unnamed pipes are unidirectional. I'm not sure what you mean by the 'Reading AND writing to the "in" and "out" streams of a thread', but 'thread' is weird.
Jonathan Leffler
+4  A: 

This is all based on a UNIX-like system; I'm not familiar with the specific behavior of recent versions of Windows.

It seems like the receiving end will block waiting for input, which I would expect, but will the sending end block sometimes waiting for someone to read from the stream?

Yes, although on a modern machine it may not happen often. The pipe has an intermediate buffer that can potentially fill up. If it does, the write side of the pipe will indeed block. But if you think about it, there aren't a lot of files that are big enough to risk this.

If I write an eof to the stream can I keep continue writing to that stream until I close it?

Um, you mean like a CTRL-D, 0x04? Sure, as long as the stream is set up that way. Viz.

506 # cat | od -c
abc
^D
efg
0000000    a   b   c  \n 004  \n   e   f   g  \n                        
0000012

Are there differences in the behaviour named and unnamed pipes?

Yes, but they're subtle and implementation dependent. The biggest one is that you can write to a named pipe before the other end is running; with unnamed pipes, the file descriptors get shared during the fork/exec process, so there's no way to access the transient buffer without the processes being up.

Does it matter which end of the pipe I open first with named pipes?

Nope.

Is the behaviour of pipes consistent between different linux systems?

Within reason, yes. Buffer sizes etc may vary.

Does the behaviour of the pipes depend on the shell I'm using or the way I've configured it?

No. When you create a pipe, under the covers what happens is your parent process (the shell) creates a pipe which has a pair of file descriptors, then does a fork exec like this pseudocode:

Parent:

create pipe, returning two file descriptors, call them fd[0] and fd[1]
fork write-side process
fork read-side process

Write-side:

close fd[0]
connect fd[1] to stdout
exec writer program

Read-side:

close fd[1]
connect fd[0] to stdin
exec reader program

Are there any other questions I should be asking or issues I should be aware of if I want to use pipes in this way?

Is everything you want to do really going to lay out in a line like this? If not, you might want to think about a more general architecture. But the insight that having lots of separate processes interacting through the "narrow" interface of a pipe is desirable is a good one.

[Updated: I had the file descriptor indices reversed at first. They're correct now, see man 2 pipe.]

Charlie Martin
Problem 1: You can't write EOF to a file or pipe; you close the file or pipe to indicate EOF (or, with a file, you might truncate it). Control-D is not EOF except in the context of a terminal.
Jonathan Leffler
Problem 2: You cannot write to a named pipe before there is a receiver, even if you open in with O_NONBLOCK.
Jonathan Leffler
Problem 3: On most systems, the pipe buffer size is quite small, like 4096 or 5120 bytes. It doesn't take an awful lot of output to fill that up.
Jonathan Leffler
On Unix, write blocks when a pipe is full.
titaniumdecoy
Jonathan, re: 1, okay, you tell me what he meant by "writing EOF to a pipe" hen. re: 3, some Unices dynamically allocate the pipe buffer size, or parameterize it; it *can* be quite large.
Charlie Martin
@Charlie Martin: see my answer. I don't think writing EOF to a pipe makes sense as a question - and trying to treat a pipe as a terminal does not help much. I agree the question leaves much to be answered at that point.
Jonathan Leffler
@Charlie Martin: see also my answer for a discussion of pipe buffer size. I measure sizes up to 64 KB; does that count as "quite large"? Larger than I expected - yes. "Large" in any absolute sense - not really in my book, but this could just be semantics. I'd probably start large at about 1MB.
Jonathan Leffler
+3  A: 

As Dashogun and Charlie Martin noted, this is a big question. Some parts of their answers are inaccurate, so I'm going to answer too.

I am interested in writing separate program modules that run as independent threads that I could hook together with pipes.

Be wary of trying to use pipes as a communication mechanism between threads of a single process. Because you would have both read and write ends of the pipe open in a single process, you would never get the EOF (zero bytes) indication.

If you were really referring to processes, then this is the basis of the classic Unix approach to building tools. Many of the standard Unix programs are filters that read from standard input, transform it somehow, and write the result to standard output. For example, tr, sort, grep, and cat are all filters, to name but a few. This is an excellent paradigm to follow when the data you are manipulating permits it. Not all data manipulations are conducive to this approach, but there are many that are.

The motivation would be that I could write and test each module completely independently, perhaps even write them in different languages, or run the different modules on different machines.

Good points. Be aware that there isn't really a pipe mechanism between machines, though you can get close to it with programs such as rsh or (better) ssh. However, internally, such programs may read local data from pipes and send that data to remote machines, but they communicate between machines over sockets, not using pipes.

There are a wide variety of possibilities here. I have used piping for a while, but I am unfamiliar with the nuances of its behaviour.

OK; asking questions is one (good) way to learn. Experimenting is another, of course.

It seems like the receiving end will block waiting for input, which I would expect, but will the sending end block sometimes waiting for someone to read from the stream?

Yes. There is a limit to the size of a pipe buffer. Classically, this was quite small - 4096 or 5120 were common values. You may find that modern Linux uses a larger value. You can use fpathconf() and _PC_PIPE_BUF to find out the size of a pipe buffer. POSIX only requires the buffer to be 512 (that is, _POSIX_PIPE_BUF is 512).

If I write an eof to the stream can I keep continue writing to that stream until I close it?

Technically, there is no way to write EOF to a stream; you close the pipe descriptor to indicate EOF. If you are thinking of control-D or control-Z as an EOF character, then those are just regular characters as far as pipes are concerned - they only have an effect like EOF when typed at a terminal that is running in canonical mode (cooked, or normal).

Are there differences in the behaviour named and unnamed pipes?

Yes, and no. The biggest differences are that unnamed pipes must be set up by one process and can only be used by that process and children who share that process as a common ancestor. By contrast, named pipes can be used by previously unassociated processes. The next big difference is a consequence of the first; with an unnamed pipe, you get back two file descriptors from a single function (system) call to pipe(), but you open a FIFO or named pipe using the regular open() function. (Someone must create a FIFO with the mkfifo() call before you can open it; unnamed pipes do not need any such prior setup.) However, once you have a file descriptor open, there is precious little difference between a named pipe and an unnamed pipe.

Does it matter which end of the pipe I open first with named pipes?

No. The first process to open the FIFO will (normally) block until there's a process with the other end open. If you open it for reading and writing (aconventional but possible) then you won't be blocked; if you use the O_NONBLOCK flag, you won't be blocked.

Is the behaviour of pipes consistent between different Linux systems?

Yes. I've not heard of or experienced any problems with pipes on any of the systems where I've used them.

Does the behaviour of the pipes depend on the shell I'm using or the way I've configured it?

No: pipes and FIFOs are independent of the shell you use.

Are there any other questions I should be asking or issues I should be aware of if I want to use pipes in this way?

Just remember that you must close the reading end of a pipe in the process that will be writing, and the writing end of the pipe in the process that will be reading. If you want bidirectional communication over pipes, use two separate pipes. If you create complicated plumbing arrangements, beware of deadlock - it is possible. A linear pipeline does not deadlock, however (though if the first process never closes its output, the downstream processes may wait indefinitely).


I observed both above and in comments to other answers that pipe buffers are classically limited to quite small sizes. @Charlie Martin counter-commented that some versions of Unix have dynamic pipe buffers and these can be quite large.

I'm not sure which ones he has in mind. I used the test program that follows on Solaris, AIX, HP-UX, MacOS X, Linux and Cygwin / Windows XP (results below):

#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>

static const char *arg0;

static void err_syserr(char *str)
{
    int errnum = errno;
    fprintf(stderr, "%s: %s - (%d) %s\n", arg0, str, errnum, strerror(errnum));
    exit(1);
}

int main(int argc, char **argv)
{
    int pd[2];
    pid_t kid;
    size_t i = 0;
    char buffer[2] = "a";
    int flags;

    arg0 = argv[0];

    if (pipe(pd) != 0)
        err_syserr("pipe() failed");
    if ((kid = fork()) < 0)
        err_syserr("fork() failed");
    else if (kid == 0)
    {
        close(pd[1]);
        pause();
    }
    /* else */
    close(pd[0]);
    if (fcntl(pd[1], F_GETFL, &flags) == -1)
        err_syserr("fcntl(F_GETFL) failed");
    flags |= O_NONBLOCK;
    if (fcntl(pd[1], F_SETFL, &flags) == -1)
        err_syserr("fcntl(F_SETFL) failed");
    while (write(pd[1], buffer, sizeof(buffer)-1) == sizeof(buffer)-1)
    {
        putchar('.');
        if (++i % 50 ==  0)
            printf("%u\n", (unsigned)i);
    }
    if (i % 50 !=  0)
        printf("%u\n", (unsigned)i);
    kill(kid, SIGINT);
    return 0;
}

I'd be curious to get extra results from other platforms. Here are the sizes I found. All the results are larger than I expected, I must confess, but Charlie and I may be debating the meaning of 'quite large' when it comes to buffer sizes.

  •   8196 - HP-UX 11.23 for IA-64 (fcntl(F_SETFL) failed)
  • 16384 - Solaris 10
  • 16384 - MacOS X 10.5 (O_NONBLOCK did not work, though fcntl(F_SETFL) did not fail)
  • 32768 - AIX 5.3
  • 65536 - Cygwin / Windows XP (O_NONBLOCK did not work, though fcntl(F_SETFL) did not fail)
  • 65536 - SuSE Linux 10 (and CentOS) (fcntl(F_SETFL) failed)

One point that is clear from these tests is that O_NONBLOCK works with pipes on some platforms and not on others.

The program creates a pipe, and forks. The child closes the write end of the pipe, and then goes to sleep until it gets a signal - that's what pause() does. The parent then closes the read end of the pipe, and sets the flags on the write descriptor so that it won't block on an attempt to write on a full pipe. It then loops, writing one character at a time, and printing a dot for each character written, and a count and newline every 50 characters. When it detects a write problem (buffer full, since the child is not reading a thing), it stops the loop, writes the final count, and kills the child.

Jonathan Leffler
It is an excellent answer.
J.F. Sebastian
CentOS - fcntl(F_SETFL) failed; Ubuntu - just blocks after ...65500.
J.F. Sebastian
Thanks, J F Sebastian. That's what I see with SuSE too; the limit of 65536 is inferred.
Jonathan Leffler