ansaurus

Question

Confusion about streamwriters and how compression works

Answer 1

+1 A:

A Word .doc file has a very specific binary format; I'm not sure that StreamWriter is going to make it easy to write one...

Re compression... you can compress streams of data with things like GZipStream, but this is usually used for transport purposes; the recipient would also need to understand to decompress it (for example, a TCP client/server might agree to use compression). With a file, assuming you want it "as original" at the other end, you'd need a service at the other end to decompress it.

Personally, for local network usage, I'm not sure it is worth it unless you are shifting serious volumes of data - just use robocopy and use a fast network.

For internet usage, most protocols have compression support built in. Http with gzip/deflate being the most obvious.

Of course, if you are talking about archiving, then storing files in something like .zip archives makes a lot of sense... when doing this "en masse", I tend to run the archive tool on the server closest to the physical disks, to maximise IO performance.

Marc Gravell 2009-02-22 19:54:36

I guess when dealing with word documents, if you are going to read one, you would need to provide a condition (not in code), that the data is pure text. Is there a recommended practise to read a word document with text and pictures etc?

dotnetdev 2009-02-22 21:30:29

No; you'd use an existing library - for example, word automation. Or for docx, xml writing + zip.

Marc Gravell 2009-02-22 22:31:18

Answer 2

+3 A:

I think you are confused about compression in general.

You "compress" any data to reduce its size. But by reducing it size the structure of the data must also be changed.

So if you save an MS Word document a .doc file, you will get MS Word document structure in the .doc file.

But if you then compress the .doc file, the file will gets smaller by the magic of the compression algorithms... but it will no longer contains MS Word document structure.

So how can MS Word reads the alien structure it gets? It can't!

That's why you have to "decompress", to restore any structure of the data it has before being compacted so it become useful again.

For example, suppose you have the sentence "Woah .NET rocks", a certain compression algorithm might replace each word with a page in an English dictionary and produce the string "77 69 84" instead.

Woah -> 77
.NET -> 69
rocks -> 84

So how do you make sense of the string "77 69 84"?

It doesn't make sense of course! Because it has been compressed.

To make sense of it again, you'll have to decompress it, which goes like this:

77 -> Woah
69 -> .NET
84 -> rocks

So basically, you are taking "other people"'s data structure and compress them. And after compression, the data would not have a sensible meaning to them because it is in compacted form. Thus you must "decompress" it so that "other people" could read it again."

I'm I understanding your question correctly?

chakrit 2009-02-22 20:02:39

Answer 3

A:

Hi,

Firstly, I made the above post but not when I was at home, so I used an unregistered account.

You have answered my confusion. I actually knew that when you compress data, to understand it again you decompress it (like with .zips).

On the question of compression in .NET, when I decompress data, the size value is greater than that of the original size. I have the code below:

      MemoryStream ms = new MemoryStream();
        // Use the newly created memory stream for the compressed data.
        DeflateStream compressedzipStream = new DeflateStream(ms, CompressionMode.Compress, true);
        Console.WriteLine("Compression");
        compressedzipStream.Write(buffer, 0, buffer.Length);
        // Close the stream.
        compressedzipStream.Close();
        Console.WriteLine("Original size: {0}, Compressed size: {1}", buffer.Length,    ms.Length);

On the last line (Console.Writeline), I have the following data:

Original size: 9708, Compressed size: 13943. Shouldn't the compressed size be less? I am working with a .jgp file.

Thanks

dotnetdev 2009-02-22 21:25:25

No; only data that inherently is inefficient can be compressed. Text compresses well. Images (**especially** jpg which is *already* compressed in different ways) simply don't compress very well. Such data can often get *bigger* when "compressed".

Marc Gravell 2009-02-22 22:32:58

One last question. When compressing a jpeg, the code runs fine, but the image can't be opened. I assume this has something to do with what is mentioned here?

dotnetdev 2009-02-23 12:10:59

@dotnetdev There are more than 1 ways to compress something. JPEG is itself a compression standard. It's an image data that is already compressed. If you re-compress the JPEG file with *another* algorithm (that isn't JPEG) then you've destroyed the original (compressed) JPEG file structure by turning it into another non-image data structure .... thus no image viewer would read it. Other than ZIP and JPEG, there're also GZIP BZIP2 TAR and lots and lots of other niche-specific compression such as JPEG for images and MP3 for music files.. they are *not* interchangable

chakrit 2009-12-30 19:50:49

@dotnetdev You cannot decompress a JPEG file using a ZIP algorithm and likewise you cannot decompress an MP3 file using Paint. They're different in structure and in the compression and decompression algorithm itself. the `DeflateStream` you mentioned is yet *another* kind of compression algorithm. you can't use Deflate algorithm to decompress a JPEG file, for example.

chakrit 2009-12-30 19:53:56

Answer 4

A:

Not all data is compressable. For example if you tried to compress an already compressed file (as in a jpeg) it most likely will gain size.

2009-02-23 00:10:01

One last question. When compressing a jpeg, the code runs fine, but the image can't be opened. I assume this has something to do with what is mentioned here?

dotnetdev 2009-02-23 12:10:28

ansaurus

tags:

views:

answers:

Confusion about streamwriters and how compression works

related questions