views:

1269

answers:

5

I work with a propriety client/server message format that restricts what I can send over the wire. I can't send a serialized object, I have to store the data in the message as a String. The data I am sending are large comma-separated values, and I want to compress the data before I pack it into the message as a String.

I attempted to use Deflater/Inflater to achieve this, but somewhere along the line I am getting stuck.

I am using the two methods below to deflate/inflate. However, passing the result of the compressString() method to decompressStringMethod() returns a null result.

public String compressString(String data) {
  Deflater deflater = new Deflater();
  byte[] target = new byte[100];
  try {
   deflater.setInput(data.getBytes(UTF8_CHARSET));
   deflater.finish();
   int deflateLength = deflater.deflate(target);
   return new String(target);
  } catch (UnsupportedEncodingException e) {
   //TODO
  }

  return data;
 }

 public String decompressString(String data) {

  String result = null;
  try {
   byte[] input = data.getBytes();

   Inflater inflater = new Inflater();
   int inputLength = input.length;
   inflater.setInput(input, 0, inputLength);

   byte[] output = new byte[100];
   int resultLength = inflater.inflate(output);
   inflater.end();

   result = new String(output, 0, resultLength, UTF8_CHARSET);
  } catch (DataFormatException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (UnsupportedEncodingException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }

  return result;
 }
A: 

TO ME: write compress algorithm myself is difficult but writing binary to string is not. So if I were you, I will serialize the object normally and zip it with compression (as provided by ZipFile) then convert to string using something like Base64 Encode/Decode.

I actually have BASE64 ENCODE/DECODE functions. If you wanted I can post it here.

NawaMan
A: 

The problem is that you convert compressed bytes to a string, which breaks the data. Your compressString and decompressString should work on byte[]

EDIT: Here is revised version. It works

EDIT2: And about base64. you're sending bytes, not strings. You don't need base64.

public static void main(String[] args) {
    String input = "Test input";
    byte[] data = new byte[100];

    int len = compressString(input, data, data.length);

    String output = decompressString(data, len);

    if (!input.equals(output)) {
        System.out.println("Test failed");
    }

    System.out.println(input + " " + output);
}

public static int compressString(String data, byte[] output, int len) {
    Deflater deflater = new Deflater();
    deflater.setInput(data.getBytes(Charset.forName("utf-8")));
    deflater.finish();
    return deflater.deflate(output, 0, len);
}

public static String decompressString(byte[] input, int len) {

    String result = null;
    try {
        Inflater inflater = new Inflater();
        inflater.setInput(input, 0, len);

        byte[] output = new byte[100]; //todo may oveflow, find better solution
        int resultLength = inflater.inflate(output);
        inflater.end();

        result = new String(output, 0, resultLength, Charset.forName("utf-8"));
    } catch (DataFormatException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    return result;
}
tulskiy
agree that it breaks when I convert back to a String--but that is the problem I am trying to solve. I am required to set the data as a String into the message to the client. Maybe I will edit my question to make the problem clearer. Thanks for the response.
filsa
You compress the whole message before sending it, and you'll have to send bytes. You can use base64, but it will ruin the whole compressiong thing, there will be no decrease in message size.
tulskiy
the message container I am required to use only takes a String. You are right about Base64 counteracting the gains I might make via compression--basically, the compression has to be better than 75% for me to come out ahead on the compression (Base64 takes 3 bytes and outputs 4). In my tests so far, I am hitting about 50% compression or higher, based on the data set I am sending.
filsa
So yes, I misunderstood the question. You don't have any access to the message format you're working with.
tulskiy
A: 

If you have a piece of code which seems to be silently failing, perhaps you shouldn't catch and swallow Exceptions:

catch (UnsupportedEncodingException e) {
    //TODO
}

But the real reason why decompress returns null is because your exception handling doesn't specify what to do with result when you catch an exception - result is left as null. Are you checking the output to see if any Exceptions are occuring?

If I run your decompress() on a badly formatted String, Inflater throws me this DataFormatException:

java.util.zip.DataFormatException: incorrect header check
    at java.util.zip.Inflater.inflateBytes(Native Method)
    at java.util.zip.Inflater.inflate(Inflater.java:223)
    at java.util.zip.Inflater.inflate(Inflater.java:240)
matt b
yeah, agree that catch-and-ignore exceptions are bad bad bad. This is a work in progress...That said, my current test case uses a valid string, and it's not working. I will add a test case for invalid strings as well. Still trying to work this one out.
filsa
well, you should probably also always pass the same encoding to String.getBytes() - you have at least one instance in your code sample where you are not passing an explicit encoding, and relying on the platform deafult.
matt b
+2  A: 

From what I can tell, your current approach is:

  1. Convert String to byte array using getBytes("UTF-8").
  2. Compress byte array
  3. Convert compressed byte array to String using new String(bytes, ..., "UTF-8").
  4. Transmit compressed string
  5. Receive compressed string
  6. Convert compressed string to byte array using getBytes("UTF-8").
  7. Decompress byte array
  8. Convert decompressed byte array to String using new String(bytes, ..., "UTF-8").

The problem with this approach is in step 3. When you compress the byte array, you create a sequence of bytes which may no longer be valid UTF-8. The result will be an exception in step 3.

The solution is to use a "bytes to characters" encoding scheme like Base64 to turn the compressed bytes into a transmissible string. In other words, replace step 3 with a call to a Base64 encode function, and step 6 with a call to a Base64 decode function.

Notes:

  1. For small strings, compressing and encoding is likely to actually increase the size of the transmitted string.
  2. If the compacted String is going to be incorporated into a URL, you may want to pick a different encoding to Base64 that avoids characters that need to be URL escaped.
  3. Depending on the nature of the data you are transmitting, you may find that a domain specific compression works better than a generic one. Consider compressing the data before creating the comma-separated string. Consider alternatives to comma-separated strings.
Stephen C
Thanks for your answer. I finally had time after the weekend to implement this--using the apache commons-codec library it was simple to implement.1. yes, short strings are actually longer. But most of my strings are going to be quite long.2. the commons-codec library (at least as of v1.4) allows you to specify URL-safe or not.Thanks again! I'd up-vote, but I lack the karma to do so yet. Cheers.
filsa
You are welcome :-)
Stephen C
A: 

I am also facing same problem. My system accepts only string and I need to compress string and than send it to remote. The problem occurs when decoding encrypted string. Can anyone help in this respect.

shaky