tags:

views:

200

answers:

4

Before I begin, is it possible to write a .doc file with mixed content using a streamwriter? For example, I may have a .doc file with images and text - would a streamwriter be suitable for this? I assume a textwriter is for writing a text only document.

What I am trying to do is compress a file (format not known), which is easy enough. But what I am confused about is why would I call decompress? This will make the size its normal and larger value, so what is the point? If I want to compress a file and send it to a network drive, should I compress it, copy it to the network location, and decompress that to a new file? This app will be a windows service, so I will need to use Windows Impersonation, right?

Thanks

+1  A: 

A Word .doc file has a very specific binary format; I'm not sure that StreamWriter is going to make it easy to write one...

Re compression... you can compress streams of data with things like GZipStream, but this is usually used for transport purposes; the recipient would also need to understand to decompress it (for example, a TCP client/server might agree to use compression). With a file, assuming you want it "as original" at the other end, you'd need a service at the other end to decompress it.

Personally, for local network usage, I'm not sure it is worth it unless you are shifting serious volumes of data - just use robocopy and use a fast network.

For internet usage, most protocols have compression support built in. Http with gzip/deflate being the most obvious.

Of course, if you are talking about archiving, then storing files in something like .zip archives makes a lot of sense... when doing this "en masse", I tend to run the archive tool on the server closest to the physical disks, to maximise IO performance.

Marc Gravell
I guess when dealing with word documents, if you are going to read one, you would need to provide a condition (not in code), that the data is pure text. Is there a recommended practise to read a word document with text and pictures etc?
dotnetdev
No; you'd use an existing library - for example, word automation. Or for docx, xml writing + zip.
Marc Gravell
+3  A: 

I think you are confused about compression in general.

You "compress" any data to reduce its size. But by reducing it size the structure of the data must also be changed.

So if you save an MS Word document a .doc file, you will get MS Word document structure in the .doc file.

But if you then compress the .doc file, the file will gets smaller by the magic of the compression algorithms... but it will no longer contains MS Word document structure.

So how can MS Word reads the alien structure it gets? It can't!

That's why you have to "decompress", to restore any structure of the data it has before being compacted so it become useful again.

For example, suppose you have the sentence "Woah .NET rocks", a certain compression algorithm might replace each word with a page in an English dictionary and produce the string "77 69 84" instead.

Woah -> 77
.NET -> 69
rocks -> 84

So how do you make sense of the string "77 69 84"?

It doesn't make sense of course! Because it has been compressed.

To make sense of it again, you'll have to decompress it, which goes like this:

77 -> Woah
69 -> .NET
84 -> rocks

So basically, you are taking "other people"'s data structure and compress them. And after compression, the data would not have a sensible meaning to them because it is in compacted form. Thus you must "decompress" it so that "other people" could read it again."

I'm I understanding your question correctly?

chakrit
A: 

Hi,

Firstly, I made the above post but not when I was at home, so I used an unregistered account.

You have answered my confusion. I actually knew that when you compress data, to understand it again you decompress it (like with .zips).

On the question of compression in .NET, when I decompress data, the size value is greater than that of the original size. I have the code below:

      MemoryStream ms = new MemoryStream();
        // Use the newly created memory stream for the compressed data.
        DeflateStream compressedzipStream = new DeflateStream(ms, CompressionMode.Compress, true);
        Console.WriteLine("Compression");
        compressedzipStream.Write(buffer, 0, buffer.Length);
        // Close the stream.
        compressedzipStream.Close();
        Console.WriteLine("Original size: {0}, Compressed size: {1}", buffer.Length,    ms.Length);

On the last line (Console.Writeline), I have the following data:

Original size: 9708, Compressed size: 13943. Shouldn't the compressed size be less? I am working with a .jgp file.

Thanks

dotnetdev
No; only data that inherently is inefficient can be compressed. Text compresses well. Images (**especially** jpg which is *already* compressed in different ways) simply don't compress very well. Such data can often get *bigger* when "compressed".
Marc Gravell
One last question. When compressing a jpeg, the code runs fine, but the image can't be opened. I assume this has something to do with what is mentioned here?
dotnetdev
@dotnetdev There are more than 1 ways to compress something. JPEG is itself a compression standard. It's an image data that is already compressed. If you re-compress the JPEG file with *another* algorithm (that isn't JPEG) then you've destroyed the original (compressed) JPEG file structure by turning it into another non-image data structure .... thus no image viewer would read it. Other than ZIP and JPEG, there're also GZIP BZIP2 TAR and lots and lots of other niche-specific compression such as JPEG for images and MP3 for music files.. they are *not* interchangable
chakrit
@dotnetdev You cannot decompress a JPEG file using a ZIP algorithm and likewise you cannot decompress an MP3 file using Paint. They're different in structure and in the compression and decompression algorithm itself. the `DeflateStream` you mentioned is yet *another* kind of compression algorithm. you can't use Deflate algorithm to decompress a JPEG file, for example.
chakrit
A: 

Not all data is compressable. For example if you tried to compress an already compressed file (as in a jpeg) it most likely will gain size.

One last question. When compressing a jpeg, the code runs fine, but the image can't be opened. I assume this has something to do with what is mentioned here?
dotnetdev