views:

100

answers:

1

Have some data in a sybase image type column that I want to use in a C# app. The data has been compressed by Java using the java.util.zip package. I wanted to test that I could decompress the data in C#. So I wrote a test app that pulls it out of the database:

byte[] bytes = (byte[])reader.GetValue(0);  

This gives me a compressed byte[] of 2479 length.
Then I pass this to a seemingly standard C# decompression method:

public static byte[] Decompress(byte[] gzBuffer)
{
    MemoryStream ms = new MemoryStream();
    int msgLength = BitConverter.ToInt32(gzBuffer, 0);
    ms.Write(gzBuffer, 4, gzBuffer.Length - 4);
    byte[] buffer = new byte[msgLength];

    ms.Position = 0;
    GZipStream zip = new GZipStream(ms, CompressionMode.Decompress);
    zip.Read(buffer, 0, buffer.Length);

    return buffer;
}  

The value for msgLength is 1503501432 which seems way out of range. The original document should be in the range of 5K -50k. Anyway when I use that value to create "buffer" not surprisingly I get an OutOfMemoryException. What is happening? Jim

The Java compress method is as follows:

public byte[] compress(byte[] bytes) throws Exception {
    byte[] results = new byte[bytes.length];
    Deflater deflator = new Deflater();
    deflater.setInput(bytes);
    deflater.finish();
    int len = deflater.deflate(results);
    byte[] out = new byte[len];
    for(int i=0; i<len; i++) {
        out[i] = results[i];
    }
    return(out);
}  
+8  A: 

As I cant see your java code, I can only guess you are compressing your data to a zip file stream. Therefore it will obviously fail if you are trying to decompress that stream with a gzip decompression in c#. Either you change your java code to a gzip compression (Example here at the bottom of the page), or you decompress the zip file stream in c# with an appropriate library (e.g. SharpZipLib).

Update

Ok now, I see you are using deflate for the compression in java. So, obviously you have to use the same algorithm in c#: System.IO.Compression.DeflateStream

public static byte[] Decompress(byte[] buffer)
{
    using (MemoryStream ms = new MemoryStream(buffer))
    using (Stream zipStream = new DeflateStream(ms, 
                          CompressionMode.Decompress, true))
    {
        int initialBufferLength = buffer.Length * 2;

        byte[] buffer = new byte[initialBufferLength];
        bool finishedExactly = false;
        int read = 0;
        int chunk;

        while (!finishedExactly && 
              (chunk = zipStream.Read(buffer, read, buffer.Length - read)) > 0)
        {
            read += chunk;

            if (read == buffer.Length)
            {
                int nextByte = zipStream.ReadByte();

                // End of Stream?
                if (nextByte == -1)
                {
                    finishedExactly = true;
                }
                else
                {
                    byte[] newBuffer = new byte[buffer.Length * 2];
                    Array.Copy(buffer, newBuffer, buffer.Length);
                    newBuffer[read] = (byte)nextByte;
                    buffer = newBuffer;
                    read++;
                }
            }
        }
        if (!finishedExactly)
        {
            byte[] final = new byte[read];
            Array.Copy(buffer, final, read);
            buffer = final;
        }
    }

    return buffer;
}  
Philip Daubmeier
Ok, and so the incompatibility of the compression caused BitConverter to get the wrong size?
Jim Jones
Yes, in your original function you were trying to get the length of the uncompressed buffer by reading the first 4 bytes of the buffer (`int msgLength = (BitConverter.ToInt32(gzBuffer, 0);`). Deflate does not store any info about the length of the uncompressed buffer. Therefore you effectively took a arbitrary number (the first 4 bytes have a different meaning in the deflate stream) and tried to allocate a `byte[]` of that length. That failed of course because the number was way to large.
Philip Daubmeier
That is also the reason my code snippet is much longer than yours. With gzip you know the length of the uncompressed data and can allocate the `byte[]` accordingly. My Deflate code has to 'guess' the size (initially takes the doubled size of the compressed buffer) and enlarges it on demand. In the end the buffer is one more time copied to a buffer of the exact size.
Philip Daubmeier
So maybe you are better off taking gzip for both your java compression and c# decompression functions. That may be of much better performance, if that part is critical in your application.
Philip Daubmeier
Getting a "block size" error. Due to header?
Jim Jones
Yah, think I'm going to go with your recomendation of using gzip, easier and performance is an issue.
Jim Jones
Once I switched to using gzip on the java side, then I attempted to use the C# Decompress I first posted to decompress the bytes I pulled out of the database, but got an "Invalid GZip header" error. So used the Ionic package Ionic.Zlib.GZipStream.UncompressBuffer(byte[]) and it worked perfect. What is the explanation for that?
Jim Jones
Well, seems the .net framework expects a different header than java. I dont know why, but if you want to know that exactly you could look at the java sourcecode that compresses it and compare it to the data you get with a c# gzip compression.
Philip Daubmeier
I just took a look at the java sourcecode for the gzip compression, and looked at the header of some c# gzip compressed bytes: That is exactly the same header, and it is according to the spec (http://www.gzip.org/zlib/rfc-gzip.html). Think your problem is somewhere else. You dont have to use the Ionic package (you can if you want to of course), that should work with the framework features alone. Perhaps you could show some code that produces this "Invalid GZip header" error. You could open a new question for that.
Philip Daubmeier
OK, will post on "GZIP Java vs .NET"
Jim Jones