tags:

views:

608

answers:

5

I used the code below to compress files and they keep growing instead of shrinking. I comressed a 4 kb file and it became 6. That is understandable for a small file because of the compression overhead. I tried a 400 mb file and it became 628 mb after compressing. What is wrong? See the code. (.net 2.0)

Public Sub Compress(ByVal infile As String, ByVal outfile As String)
    Dim sourceFile As FileStream = File.OpenRead(inFile)
    Dim destFile As FileStream = File.Create(outfile)

    Dim compStream As New GZipStream(destFile, CompressionMode.Compress)

    Dim myByte As Integer = sourceFile.ReadByte()
    While myByte <> -1
        compStream.WriteByte(CType(myByte, Byte))
        myByte = sourceFile.ReadByte()
    End While

    sourceFile.Close()
    destFile.Close()
End Sub
+2  A: 

Are you sure that writing byte by byte to the stream is a really good idea? It will certainly not have ideal performance characteristics and maybe that's what confuses the gzip compressing algorithm too.

Also, it might happen that the data you are trying to compress is just not really well-compressable. If I were you I would try your code with a text document of the same size as text documents tend to compress much better than random binary.

Also, you could try using a pure DeflateStream as opposed to a GZipStream as they both use the same compression algorithm (deflate), the only difference is that gzip adds some additional data (like error checking) so a DeflateStream might yield smaller results.

My VB.NET is a bit rusty so I'll rather not try to write a code example in VB.NET. Instead, here's how you should do it in C#, it should be relatively straightforward to translate it to VB.NET for someone with a bit of experience: (or maybe someone who is good at VB.NET could edit my post and translate it to VB.NET)

FileStream sourceFile;
GZipStream compStream;

byte[] buffer = new byte[65536];
int bytesRead = 0;
while (bytesRead = sourceFile.Read(buffer, 0, 65536) > 0)
{
     compStream.Write(buffer, 0, bytesRead);
}
DrJokepu
I would guess that the gzipper buffers and zips every so often not one byte a time. That would make no sense as all compression algorithms require streams of patterns and doing things 1 byte at a time would cancel any such ability.
mP
+4  A: 

If the underlying file is itself highly unpredictable (already compressed or largely random) then attempting to compress it will cause the file to become bigger.

Going from 400 to 628Mb sounds highly improbable as an expansion factor since the deflate algorithm (used for GZip) tends towards a maximum expansion factor of 0.03% The overhead of the GZip header should be negligible.

Edit: The 4.0 c# release indicates that the compression libraries have been improved to not cause significant expansion of uncompressable data. This suggests that they were not implementing the "fallback to the raw stream blocks" mode. Try using SharpZipLib's library as a quick test. That should provide you with close to identical performance when the stream is incompressible by deflate. If it does consider moving to that or waiting for the 4.0 release for a more performant BCL implementation. Note that the lack of compression you are getting strongly suggests that there is no point you attempting to compress further anyway

ShuggyCoUk
A: 

Thank you all for good answers. Earlier on I tried to compress .wmv files and one text file. I changed the code to DeflateStream and it seems to work now. Cheers.

Nick Masao
Compressing a wmv file is almost always going to be pointless. The video/audio streams within the file are almost always compressed anyway. As such you are just wasting cpu cycles. placing multiple files into a zip file is different, this is organizational and might be a reasonable thing to do.
ShuggyCoUk
+1  A: 

This is a known anomaly with the built-in GZipStream (And DeflateStream).
I can think of two workarounds:

  • use an alternative compressor.
  • build some logic that examines the size of the "compressed" output and compares it to the size of the input. If larger, chuck the output and just store the data.

DotNetZip includes a "fixed" GZipStream based on a managed port of zlib. (It takes approach #1 from above). The Ionic.Zlib.GZipStream can replace the built-in GZipStream in your apps with a simple namespace swap.

Cheeso
A: 

Example to decompress stream using gzipstream in vb.net http://w3vb.net/vb-net-inputoutput/streams/gzipstream/decompress-data-with-gzipstream-in-vb-net