views:

450

answers:

5

I'm trying to have Java server and C++ clients communicate over TCP under the following conditions: text mode, and binary/encrypted mode. My problem is over the eof indicator for end of stream that DataInputStream's read(byte []) uses to return with -1. If I send binary data, what's to prevent a random byte sequence happening to represent an eof and falsely indicating to read() that the stream is ending? It seems I'm limited to text mode. I can live with that until I need to scale, but then I have the problem that I am going to encrypt the text and add message authentication. Even if I were sending from another Java program rather than C++, encrypting a string with AES+MAC would produce binary output not a normal string. What's to prevent some encrypted sequence containing a part identical to an eof? So, what are the solutions here?

A: 

"end of stream" in TCP is normally signaled by closing the socket -- that is what makes the stream actually end. If you don't really want the stream to end, but just to signal the end of a "packet" (to be followed, quite possibly, by other packets on the same connection), you can start each packet with an unencrypted length indicator (say, 2 or 4 bytes depending on your need). DataInputStream, according to its docs, is suitable only to receive streams sent by a DataOutputStream, which appears to have nothing to do with your use case as you describe it.

Alex Martelli
`DataInputStream` and `DataOutputStream` are just streams with convenience methods for some basic data types. It is common to use a `DataInputStream` to decode formats which use big-endian integers, even if the data was not encoded with `DataOutputStream` or even with a Java-based application.
Thomas Pornin
+2  A: 

If I send binary data, what's to prevent a random byte sequence happening to represent an eof and falsely indicating to read() that the stream is ending?

In most cases (including TCP/IP and similar network protocols) there is no specific data representation for an EOF. Rather, EOF is a logical abstraction that means that you have reached the end of the data stream. For example, with a Socket it means that the input side of the socket has been closed and you have read all outstanding bytes. (And for a file, it means that you have read the last bytes of the file.)

Since there is no data representation for the (logical) EOF, you don't need to worry about getting false EOFs. In short, there is no problem to be solved here.

Stephen C
+1, the question seems to come from a misunderstanding on how this form of communication works.
PSpeed
In other words, the `EOF` indication is sent out-of-band.
caf
@caf - not necessarily. In some cases, the EOF may denote that the socket has been closed due to a timeout; e.g. that nothing was sent! In addition, the *position* of the EOF is always in-band.
Stephen C
If the other end is closed while the read is blocking then you generally get an exception and not an EOF.
PSpeed
A: 

Usually when using tcp streams you have a data header format which at a minimum has a field which holds the length of data to be expected so that the receiver knows exactly how many bytes to expect. Simple example is the TLV format.

Harley Green
A: 

As Thomas Pornin replied to Aelx Martelli, DataInputStream is used even on data not sent by DataOutputStream or Java. My question is the consequences of, as the documentation says, DataInputStream's read() returning when the stream ends--that is, is there some sequence of bytes that read() interprets as a stream end, and that I cannot use it thus if there's any possibility of it occurring in the data I'm sending, as can be if I send generic binary data?

Prune
First, this probably should have just been added as clarification to your question. Second, I think this has been answered a few times over now. The data in your stream is only your data, no special funky characters. And even if there were, in the decades that sockets have existed, someone surely would have wrapped it an abstraction that kept that encoding/decoding hidden away. In fact, it's the layer that handles TCP in the first place. :)
PSpeed
A: 

My problem is over the eof indicator for end of stream that DataInputStream's read(byte []) uses to return with -1.

No it isn't. This problem is imaginary. -1 is the return code of InputStream.read() that indicates that the peer has closed the connection. It has nothing whatsoever to do with the data being sent over the connection.

EJP