views:

48

answers:

3

How does something like System.IO.Compression.GZipStream have anychance of working? Doesn't something like GZip need to look at the contents of an entire unzipped file before it can start writing the zip file?

Does GZipStream buffer everything before it writes anything? If so, what is the point of implementing the stream?

+4  A: 

Most (if not all) common compression algorithms are stream algorithms, i.e. they take streamable input and produce output. Usually they do processing in blocks of certain size, sometimes called windows size, but still they do processing sequentially. They don't need to know anything about the complete stream (neither it's full size nor contents).

Some packers do scan the input files in order to determine, which compression algorithm would be more applicable (when multiple algorithms are supported). For example, they might choose text compression algorithms if they find the input to be text or contain large text blocks. But this is not how algorithm itself works, it's just how packer works.

Eugene Mayevski 'EldoS Corp
+2  A: 

Well let me try to answer some of this for you.

Zipping is a way of compressing a bunch of data, effectively making it very small and easy to transfer. check out: http://forums.pcworld.co.nz/archive/index.php/t-22243.html

Streams: Streams are just a way of abstracting a sequence of bytes so that you can read from/write to (and usually seek within) them. To turn an object into a stream or byte array you must use the BinaryFormatter (or SoapFormatter) together with the Serializiable and NonSerialized attributes applied to fields of objects that you serialize. Serializing an object just writes its field data to any stream of your choice (since System.IO.Stream is the base class, you can write to a MemoryStream, FileStream, NetworkStream, etc.)

Dealing only with portions of a file is quite easy as well. All you need to do is use the Seek method of a Stream (or the Position property) to read certain chunks of data. For example:

byte[] buffer = new byte[4000];

myStream.Position = 1000;
myStream.Read(buffer, 0, buffer.Length);

This will only read bytes 1000-5000 into the buffer, without even looking at the rest of the data in the file I believe.

.NET allows you to read a file, the whole file, or nothing of a file. So knowing that, the GZipStream will work similar.

Some links for you:

http://www.geekpedia.com/tutorial190_Zipping-files-using-GZipStream.html

http://www.csharphacker.com/technicalblog/index.php/2009/07/27/gzipstream-helper-gzip/

http://dotnetperls.com/gzipstream

Ryan Ternier
+1  A: 

I would recommend listening to episode #205 of Security now. They talk about LZW, and stream and block compressions, it may help answer a lot of your questions and help explain how you can compress a file without knowing the entire file.

Scott Chamberlain