We are streaming data between a server (written in .Net running on Windows) to a client (written in Java running on Ubuntu) in batches. The data is in XML format. Occasionally the Java client throws an unexpected EOF while trying decompress the stream. The message content always varies and is user driven. The response from the client is also compressed using GZip. This never fails and seems to be rock solid. The response from the client is controlled by the system.
Is there a chance that some arrangement of characters or some special characters are creating false EOF markers? Could it be white-space related? Is GZip suitable for compressing XML?
I am assuming that the code to read and write from the input/output streams works because we only occasionally gets this exception and when we inspect the user data at the time there seems to be special characters (which is why I asked the question) such as the '@' sign.
Any ideas?
UPDATE: The actual code as requested. I thought it wasn't this due to the fact that I had been to a couple of sites to get help on this issue and they all more or less had the same code. Some sites mentioned appended GZip. Something to do with GZip creating multiple segments?
public String receive() throws IOException {
byte[] buffer = new byte[8192];
ByteArrayOutputStream baos = new ByteArrayOutputStream(8192);
do {
int nrBytes = in.read(buffer);
if (nrBytes > 0) {
baos.write(buffer, 0, nrBytes);
}
} while (in.available() > 0);
return compressor.decompress(baos.toByteArray());
}
public String decompress(byte[] data) throws IOException {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
ByteArrayInputStream in = new ByteArrayInputStream(data);
try {
GZIPInputStream inflater = new GZIPInputStream(in);
byte[] byteBuffer = new byte[8192];
int r;
while((r = inflater.read(byteBuffer)) > 0 ) {
buffer.write(byteBuffer, 0, r);
}
} catch (IOException e) {
log.error("Could not decompress stream", e);
throw e;
}
return new String(buffer.toByteArray());
}
At first I thought there must be something wrong with the way that I am reading in the stream and I thought perhaps I am not looping properly. I then generated a ton of data to be streamed and checked that it was looping. Also the fact they it happens so seldom and so far has not been reproducable lead me to believe that it was the content rather than the scenario. But at this point I am totally baffled and for all I know it is the code.
Thanks again everyone.
Update 2:
As requested the .Net code:
Dim DataToCompress = Encoding.UTF8.GetBytes(Data)
Dim CompressedData = Compress(DataToCompress)
To get the raw data into bytes. And then it gets compressed
Private Function Compress(ByVal Data As Byte()) As Byte()
Try
Using MS = New MemoryStream()
Using Compression = New GZipStream(MS, CompressionMode.Compress)
Compression.Write(Data, 0, Data.Length)
Compression.Flush()
Compression.Close()
Return MS.ToArray()
End Using
End Using
Catch ex As Exception
Log.Error("Error trying to compress data", ex)
Throw
End Try
End Function
Update 3: Also added more java code. the in variable is the InputStream return from socket.getInputStream()