tags:

views:

48

answers:

4

There are two really simple ways to let one program send a stream of data to another:

  • Unix pipe, or TCP socket, or something like that. This requires constant attention by consumer program, or producer program will block. Even increasing buffers their typically tiny defaults, it's still a huge problem.
  • Plain files - producer program appends with O_APPEND, consumer just reads whatever new data became available at its convenience. This doesn't require any synchronization (as long as diskspace is available), but Unix files only support truncating at the end, not at beginning, so it will fill up disk until both programs quit.

Is there a simple way to have it both ways, with data stored on disk until it gets read, and then freed? Obviously programs could communicate via database server or something like that, and not have this problem, but I'm looking for something that integrates well with normal Unix piping.

A: 

You should read some documentation on socat. You can use it to bridge the gap between tcp sockets, fifo files, pipes, stdio and others.

If you're feeling lazy, there's some nice examples of useful commands.

amccausl
I looked at documentation, and it's not obvious how to use socat for what I want. Let's say for simplicity I have exactly one producer and exactly one consumer, and want infinite pipe between them. How do I use socat for that?
taw
+1  A: 

A relatively simple hand-rolled solution.

You could have the producer create files and keep writing until it gets to a certain size/number of record, whatever suits your application. The producer then closes the file and starts a new one with an agreed naming algorithm.

The consumer reads new records from a file then when it gets to the agreed maximum size closes and unlinks it and then opens the next one.

Steve Weet
+1  A: 

If your data can be split into blocks or transactions of some sort, you can use the file method for this with a serial number. The data producer would store the first megabyte of data in outfile.1, the next in outfile.2 etc. The consumer can read the files in order and delete them when read. Thus you get something like your second method, with cleanup along the way.

You should probably wrap all this in a library, so that from the applications point of view this is a pipe of some sort.

calmh
A: 

I'm not aware of anything, but it shouldn't be too hard to write a small utility that takes a directory as an argument (or uses $TMPDIR); and, uses select/poll to multiplex between reading from stdin, paging to a series of temporary files, and writing to stdout.

Recurse