views:

309

answers:

2

I need to implement a special ZLib implementation which should run under .Net and Mono. The data /string messages are received via a socket and thus the checksum is missing. This is about raw string data, not about files.

    unsigned char zlib_header[]={
// custom additional Zlib Id
       'Z',    // Our own ID
// The normal GZIP header
       0x1f,
       0x8b,   // GZIP ID
       0x08,   // Deflated
       0x00,   // Flags
       0, 0, 0, 0, // Timestamp,
       0x00,   // Extra flags
       0x00,   // OS identifier
// afterwards compressed data without a checksum
};

I have tried to decompress the data with GZipStream and DeflateStream, but I think that GZStream fails because of the missing checksum. I have also tried various offsets, but had no luck. The checksum is not used because the data is received via a socket anyway - thus the ZLib checksum would be additional overhead. Have I missed something or could you explain me how to add the checksum and call the right library then or should I look at a 3rd party library which supports Mono and .Net? Edit: Performance is very critical as this done at least once a second. Would you recommend me at the end to use the C-Lib via Interop? I always receive Invalid Data Exception at the moment and I assume that it is related to the wrong checksum. This is the actual code which I tried to use without success:

const int HeaderSize = 1;
System.IO.MemoryStream ms = new System.IO.MemoryStream(compressedBuffer, HeaderSize, compressedBuffer.Length-HeaderSize);//remove the additional Z from the header
GZipStream zipStream = new GZipStream(ms, CompressionMode.Decompress,true);
byte[] deCompressedBytes = new byte[actualBufferLength* 10];
int resultSize=zipStream.Read(deCompressedBytes, 0, actualBufferLength);//get rid of the header      
UTF8Encoding enc = new UTF8Encoding();
string result = enc.GetString(deCompressedBytes, 0, resultSize);
+1  A: 

Just use DeflateStream instead of GZipStream.

lupus
+1  A: 

Are you sure it has to do with the checksum?

The 32-bit checksum is not optional in the GZIP format. I don't understand what you mean by "the data is received via socket thus the checksum is missing". It doesn't matter if you get the data via carrier pigeon; if it is a valid GZIP stream, it must have a 32 bit CRC. Who or what produced the source data?

There is an optional part in the GZIP spec - the 16-bit checksum. (its inclusion is also not predicated on how the GZIP bytestream was created.) The System.IO.GZipStream class will gladly accept a GZIP stream that lacks this CRC16, as well as one that includes it.

You have some other problems in the code. The actualBufferLength in your code - what is that? It is certainly not the length of the buffer to hold the decompressed data. That is 10x. But 10x seems pretty arbitrary. For very compressible data, you may exceed 10x. I suggest you use a streaming approach in decompression.

As for whether you will be able to handle a 1-per-second decompression, yes, the System.IO.GZipStream will be fast enough for small enough chunks of data. There is likely no need to go to a native C/C++ library.

ps: The DotNetZip library includes a GZipStream that is open source; you can use it out of the box or if you like, you can just grab the GZip stuff if that is all you need.

Cheeso
I am integrating against a third party from a remote location and they have obviously a bug inside their software - they are not correctly flushing the network stream. Thanks for your help anyway.
weismat