views:

67

answers:

2

I want to send 'packets' of data (i.e. discrete messages) between two programs through named pipes. Given that I have to supply a buffer and a buffer size to read, and given that the read command is blocking (I believe), I either have to have a buffer size that guarantees I never get an under-run, or to know the size of the message up-front. I don't want the sending program to have to know the size of the buffer and pad it out.

As I see it, there are three ways to do this.

  1. Prepend each package with the size of the message being sent so the listening program can read that many bytes.
  2. Read from the pipe a byte at a time and listen for a special end-of-stream value.
  3. A better way

In the first case I would be able to create a buffer of known size and read into it at once. In the second case I would have to read with a one-byte buffer. This might either be perfectly OK or a massively inefficient travesty.

The only reason I would go for the second approach would be for more flexible input (for example, manual interaction if I wanted it).

Which is the best way to go?

A: 

One simple way would be to have a discrete packet that contains a ftok (based on the named pipe) and a pointer to a null terminated string in shared memory that has been assigned using the ftok return value. All other discrete information can be passed within the packet struct.

sender:

packet.ident = ftok("./mynamedpipe");
packet.pointer = shmget(packet.ident, sizeof(message), IPC_CREAT|IPC_EXCL);
strcpy(packet.pointer, message);

receiver:

message = shmat(packet.ident, NULL, NULL);   

Note that the address in shmat isn't explicitly provided in order to prevent remapping existing memory within the receiver process.

WarrenB
Thanks, that's probably a more efficient way of doing things. However, I'm trying to make the IPC as portable, language-agnostic and general as possible, so I'd like to keep it to a vanilla byte stream.
Joe
+1  A: 

With named pipes, reads and writes are (or can be) atomic. Within limits, if you write, say, 1024 bytes to the pipe, a read call on the other end that is looking for at least 1024 bytes will actually receive the 1024 bytes, even if there is more data in the pipe at the time of the read. Further, and always, if there are just 1024 bytes in the named pipe and a read requests 4096 bytes, it will get the 1024 bytes on the first attempt, and only block on a subsequent attempt.

You say:

Given that I have to supply a buffer and a buffer size to read,

You do...

and given that the read command is blocking (I believe),

It is, unless you set O_NONBLOCK on the file descriptor...

I either have to have a buffer size that guarantees I never get an under-run,

What sort of messages are you sending? What size are you dealing with? Kilobytes, megabytes, bigger?

or to know the size of the message up-front.

There is no particular problem with having, say, a 4KB buffer in the reader, and reading the message in chunks. The issue is knowing when you reach the end of the message. By far the majority of protocols require the length up front, because it makes it easy to write the reader code reliably.

If you are going to do an 'end of stream' (EOS) marker, you are doing 'in-band signalling'. And that causes trouble. What character are you going to use? What happens when that character appears in the data? You need an escape mechanism, such as a character that means 'the next character is not the EOS marker'. For example, in text related to programming, the backslash is used for this. At a terminal, control-V often serves the purpose.

I don't want the sending program to have to know the size of the buffer and pad it out.

Why is it hard for the sender to know the size of the buffer? And why would it need to 'pad it out'?

If you are dealing with large amounts of data (from say kilobytes upwards), the single-character solution is unlikely to yield acceptable performance. I think you would be best off having the sender able to determine the size of packet and telling the reader, or designing the protocol so that there are limits on the size of a packet. If you need to convey arbitrary amounts of data, have a protocol which says:

  • Large quantity of data of unknown total size coming.
  • For each sub-packet, the message says 'this is a sub-packet of size NN KB'.
  • For the last sub-packet, the size might be shorter - that's OK and could indicate 'end of large quantity of data'.
  • If the last sub-packet is 'full size', you might send an empty last packet to indicate the EOS.
  • Alternatively, if the sub-packets can be of variable size, you can always send an explicit EOS packet.

Also consider what will happen in future if, instead of using named pipes, you want to upgrade your system to work over a socket connection to another machine.

I think you should design your system with packets where the packet headers include the size of the data (the way most networking protocols, such as TCP/IP, do things). And if there's a higher level flow of data of unknown size, handle it along the lines outlined above. But even there, it is better if you can tell the overall size ahead of time.

Jonathan Leffler
Thanks for that. All of the above is what I expected (but it's best not to assume when you're doing things for the first time). FWIW: I expect the messages to be less than 500 bytes each. Re "And why would it need to 'pad it out'?" that would be necessary if there was a fixed buffer size and the message length was not a multiple of that size (I believe the reader would block on last partially-filled buffer?).
Joe
No need for padding; as I tried to indicate, if there are, say, 500 bytes in the buffer, the read() will return those 500 bytes, even if it requests 4096 or some other bigger size. You'll get a 'short read'; it is not an error.
Jonathan Leffler