views:

85

answers:

2

Hi,

Can I rely on the first few bytes of data compressed using the System.IO.Compression.DeflateStream in .NET always being the same?

These bytes seem to always be the 1st bytes: 237, 189, 7, 96, 28, 73, 150, 37, 38, 47 , ...

I'm assuming this is some kind of header, I'd like to assume that this header is fixed and isn't going to change.

Has anyone got any extra info about this?

Background info (The reason I want to know this info is...)

I have a load of data in a database table that could do with being made smaller. I've decided I'm going to start compressing the data and not going to bother compressing the existing data. When the data gets into my .NET code the data is a String.

I'd like to be able to look at the 1st few bytes of the string and see if it has been compressed, if it has then I need to de-compress it.

I was originally thinking I could convert the string to bytes and just try de-compressing the data. Then if an exception happens, I could just assume it wasn't compressed. But I think checking the header bytes would give me much better performance.

Many thanks, Mike G

+1  A: 

To be safe (unless this is documented somewhere), stick your own magic header at the front. A GUID is a good choice for this.

Marcelo Cantos
Thanks, I think I'll go for this one, good lateral thinking! (I'm not using NET 4 yet so the other answer isn't usable for me)
MikeG
A: 

There have been some improvements to GZipStream made in .NET 4.0 that prevent this. Perhaps migrating to .NET for is an option:

The compression algorithms in System.IO.Compression have been improved in .NET 4. DeflateStream and GZipStream no longer inflate already compressed data. source

Steven