As Dashogun and Charlie Martin noted, this is a big question. Some parts of their answers are inaccurate, so I'm going to answer too.
I am interested in writing separate program modules that run as independent threads that I could hook together with pipes.
Be wary of trying to use pipes as a communication mechanism between threads of a single process. Because you would have both read and write ends of the pipe open in a single process, you would never get the EOF (zero bytes) indication.
If you were really referring to processes, then this is the basis of the classic Unix approach to building tools. Many of the standard Unix programs are filters that read from standard input, transform it somehow, and write the result to standard output. For example, tr
, sort
, grep
, and cat
are all filters, to name but a few. This is an excellent paradigm to follow when the data you are manipulating permits it. Not all data manipulations are conducive to this approach, but there are many that are.
The motivation would be that I could write and test each module completely independently, perhaps even write them in different languages, or run the different modules on different machines.
Good points. Be aware that there isn't really a pipe mechanism between machines, though you can get close to it with programs such as rsh
or (better) ssh
. However, internally, such programs may read local data from pipes and send that data to remote machines, but they communicate between machines over sockets, not using pipes.
There are a wide variety of possibilities here. I have used piping for a while, but I am unfamiliar with the nuances of its behaviour.
OK; asking questions is one (good) way to learn. Experimenting is another, of course.
It seems like the receiving end will block waiting for input, which I would expect, but will the sending end block sometimes waiting for someone to read from the stream?
Yes. There is a limit to the size of a pipe buffer. Classically, this was quite small - 4096 or 5120 were common values. You may find that modern Linux uses a larger value. You can use fpathconf()
and _PC_PIPE_BUF to find out the size of a pipe buffer. POSIX only requires the buffer to be 512 (that is, _POSIX_PIPE_BUF is 512).
If I write an eof to the stream can I keep continue writing to that stream until I close it?
Technically, there is no way to write EOF to a stream; you close the pipe descriptor to indicate EOF. If you are thinking of control-D or control-Z as an EOF character, then those are just regular characters as far as pipes are concerned - they only have an effect like EOF when typed at a terminal that is running in canonical mode (cooked, or normal).
Are there differences in the behaviour named and unnamed pipes?
Yes, and no. The biggest differences are that unnamed pipes must be set up by one process and can only be used by that process and children who share that process as a common ancestor. By contrast, named pipes can be used by previously unassociated processes. The next big difference is a consequence of the first; with an unnamed pipe, you get back two file descriptors from a single function (system) call to pipe()
, but you open a FIFO or named pipe using the regular open()
function. (Someone must create a FIFO with the mkfifo()
call before you can open it; unnamed pipes do not need any such prior setup.) However, once you have a file descriptor open, there is precious little difference between a named pipe and an unnamed pipe.
Does it matter which end of the pipe I open first with named pipes?
No. The first process to open the FIFO will (normally) block until there's a process with the other end open. If you open it for reading and writing (aconventional but possible) then you won't be blocked; if you use the O_NONBLOCK flag, you won't be blocked.
Is the behaviour of pipes consistent between different Linux systems?
Yes. I've not heard of or experienced any problems with pipes on any of the systems where I've used them.
Does the behaviour of the pipes depend on the shell I'm using or the way I've configured it?
No: pipes and FIFOs are independent of the shell you use.
Are there any other questions I should be asking or issues I should be aware of if I want to use pipes in this way?
Just remember that you must close the reading end of a pipe in the process that will be writing, and the writing end of the pipe in the process that will be reading. If you want bidirectional communication over pipes, use two separate pipes. If you create complicated plumbing arrangements, beware of deadlock - it is possible. A linear pipeline does not deadlock, however (though if the first process never closes its output, the downstream processes may wait indefinitely).
I observed both above and in comments to other answers that pipe buffers are classically limited to quite small sizes. @Charlie Martin counter-commented that some versions of Unix have dynamic pipe buffers and these can be quite large.
I'm not sure which ones he has in mind. I used the test program that follows on Solaris, AIX, HP-UX, MacOS X, Linux and Cygwin / Windows XP (results below):
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
static const char *arg0;
static void err_syserr(char *str)
{
int errnum = errno;
fprintf(stderr, "%s: %s - (%d) %s\n", arg0, str, errnum, strerror(errnum));
exit(1);
}
int main(int argc, char **argv)
{
int pd[2];
pid_t kid;
size_t i = 0;
char buffer[2] = "a";
int flags;
arg0 = argv[0];
if (pipe(pd) != 0)
err_syserr("pipe() failed");
if ((kid = fork()) < 0)
err_syserr("fork() failed");
else if (kid == 0)
{
close(pd[1]);
pause();
}
/* else */
close(pd[0]);
if (fcntl(pd[1], F_GETFL, &flags) == -1)
err_syserr("fcntl(F_GETFL) failed");
flags |= O_NONBLOCK;
if (fcntl(pd[1], F_SETFL, &flags) == -1)
err_syserr("fcntl(F_SETFL) failed");
while (write(pd[1], buffer, sizeof(buffer)-1) == sizeof(buffer)-1)
{
putchar('.');
if (++i % 50 == 0)
printf("%u\n", (unsigned)i);
}
if (i % 50 != 0)
printf("%u\n", (unsigned)i);
kill(kid, SIGINT);
return 0;
}
I'd be curious to get extra results from other platforms. Here are the sizes I found. All the results are larger than I expected, I must confess, but Charlie and I may be debating the meaning of 'quite large' when it comes to buffer sizes.
- 8196 - HP-UX 11.23 for IA-64 (fcntl(F_SETFL) failed)
- 16384 - Solaris 10
- 16384 - MacOS X 10.5 (O_NONBLOCK did not work, though fcntl(F_SETFL) did not fail)
- 32768 - AIX 5.3
- 65536 - Cygwin / Windows XP (O_NONBLOCK did not work, though fcntl(F_SETFL) did not fail)
- 65536 - SuSE Linux 10 (and CentOS) (fcntl(F_SETFL) failed)
One point that is clear from these tests is that O_NONBLOCK works with pipes on some platforms and not on others.
The program creates a pipe, and forks. The child closes the write end of the pipe, and then goes to sleep until it gets a signal - that's what pause() does. The parent then closes the read end of the pipe, and sets the flags on the write descriptor so that it won't block on an attempt to write on a full pipe. It then loops, writing one character at a time, and printing a dot for each character written, and a count and newline every 50 characters. When it detects a write problem (buffer full, since the child is not reading a thing), it stops the loop, writes the final count, and kills the child.