ansaurus

Question

What could lead to the creation of false EOF in a GZip compressed data stream.

Answer 1

+1 A:

It certainly shouldn't be due to the data involved - the streams deal with binary data, so that shouldn't make any odds at all.

However, without seeing your code, it's hard to say for sure. My first port of call would be to check anywhere that you're using InputStream.read() - check that you're using the return value correctly, rather than assuming a single call to read() will fill the buffer.

If you could provide some code, that would help a lot...

Jon Skeet 2010-08-10 10:35:04

Hi Jon, sure. Will do. I did double check this but perhaps it is incorrect after all. Thanks

uriDium 2010-08-10 14:50:05

Hi Jon. The code is up. I added both the .Net compression part and the Java decompression.

uriDium 2010-08-11 08:11:31

I think I have found my problem. Upon closer inspection of the available method it says that returns the number of bytes ready to be read without blocking. I need to be sending a size indicator to the client and continue reading until all bytes have been read.

uriDium 2010-08-11 14:35:42

@uriDium: Aargh, definitely don't use `available()` - I can't remember *ever* finding that useful!

Jon Skeet 2010-08-11 14:46:14

Sounds like the most likely culprit.

Thorbjørn Ravn Andersen 2010-08-12 05:44:31

uriDium 2010-08-12 07:42:10

...cont: you have to also send the length of the data so that the client knows how many bytes to receive. This leaves me with burning question. Is it guaranteed that the client will always receive enough data in the first read to be able to get the first couple bytes to get a length indicator? It seems to me that nothing is really guaranteed with sockets.

uriDium 2010-08-12 07:44:12

@uriDium: Yes - if you're exchanging methods over a persistent connection, you *either* need delimiters to mark "end of message" *or* you need to length-prefix each message. And no, you can't assume you'll get those bytes all in one call to `read()`. But that's relatively easy to work around, because you'll know when you're done.

Jon Skeet 2010-08-12 08:19:09

@Jon: No one really gave me an answer but the answer seems to have come from the guidance in these comments. That is why I will chose your answer to give you some credit for the advice. Lastly, if we assume that the first 4 bytes (int 32) contains the length of the rest of the payload. Then we can call read() four times to get that info. Reconstruct the int value and call read with the correct buffer size etc.

uriDium 2010-08-12 08:36:26

@uriDium: Exactly. Just check the results of calling `read()` each time to make sure the stream hasn't been closed abruptly for some reason :)

Jon Skeet 2010-08-12 08:50:37

Answer 2

A:

I would suspect that for some reason the data is altered underway, by treating it as text, not as binary, so it may either be \n conversions or a codepage alteration.

How is the gzipped stream transferred between the two systems?

Thorbjørn Ravn Andersen 2010-08-10 10:45:52

I also thought it might have something to do with those types of conversions. When I compress I convert the string raw data to bytes, in UTF8. I then compress the byte array and send it via socket. It is picked up on the other side via a socket and decompressed using the above code snippet.

uriDium 2010-08-10 15:01:52

"string raw data to bytes, in UTF8" - this sounds highly suspicious to me. Show the code. So the bytes is send directly to a socket and back up? No web server or anything?

Thorbjørn Ravn Andersen 2010-08-10 15:49:15

The GZipStream.write method expects an array of bytes correct? So I did a byte[] bytes = Encoding.UTF8.GetBytes(rawString); I am not in front of the code right now. I will be able to post the code later. Yes everything is sent directly. No web server or anything.

uriDium 2010-08-10 17:02:35

Okay, posted the code that gets the bytes.

uriDium 2010-08-11 08:15:35

Answer 3

A:

It is not pssible. EOF in TCP is delivered as an out of band FIN segment, not via the data.

EJP 2010-08-10 10:47:29

That went totally over my head :) Do you have a source that I could read.

uriDium 2010-08-10 15:02:23

It is the gzip stream that is corrupted resulting in the decoder getting the EOF wrong.

Thorbjørn Ravn Andersen 2010-08-10 15:49:59

@uriDum, he is talking about TCP/IP not gzip.

Thorbjørn Ravn Andersen 2010-08-10 15:50:21

ansaurus

tags:

views:

answers:

What could lead to the creation of false EOF in a GZip compressed data stream.

related questions