views:

1590

answers:

4

Hi I was just wondering what would be the best compression algorithm to use to compress packets before sending them over the wire? The packets are encoded using JSON. Would LZW be a good one for this or is there something better?

+1  A: 

Ummm...Correct me if I'm wrong, but if you are implementing on-the-wire compression, then you control both ends of the connection, right? In that case, if JSON is too fat a protocol, why wouldn't you just choose a different wire protocol that isn't as fat? I mean, I understand the appeal of using a standard like JSON, but if you are concerned about bandwidth, then you probably ought to pick a wire protocol that isn't all text.

Michael Kohne
" then you probably ought to pick a wire protocol that isn't all text"for example? (+1 if you name two or more ;-)
tobsen
A: 

Let the webserver compress and the browser decompress natively; gzip or deflate.

Mat
It's not a web server. The client is a Flash program.
ryeguy
+3  A: 

I think two questions will affect your answer:

1) How well can you predict the composition of the data without knowing what will happen on any particular run of the program? For instance, if your packets look like this:

{
    "vector": {
        "latitude": 16,
        "longitude": 18,
        "altitude": 20
    },
    "vector": {
        "latitude": -8,
        "longitude": 13,
        "altitude": -5
    },
    [... et cetera ...]
}

-- then you would probably get your best compression by creating a hard-coded dictionary of the text strings that keep showing up in your data and replace each occurrence of one of the text strings with the appropriate dictionary index. (Actually, if your data was this regular, you'd probably want to send just the values over the wire and simply write a function into the client to construct a JSON object from the values if a JSON object is needed.)

If you cannot predict which headers will be used, you may need to use LZW, or LZ77, or another method which looks at the data which has already gone through to find the data it can express in an especially compact form. However...

2) Do the packets need to be compressed separately from each other? If so then LZW is definitely not the method you want; it will not have time to build its dictionary up to a size that will give substantial compression results by the end of a single packet. The only chance of getting really substantial compression in this scenario, IMHO, is to use a hard-coded dictionary.

(Addendum to all of the above: as Michael Kohne points out, sending JSON means you're probably sending all text, which means that you're underusing bandwidth that has the capability of sending a much wider range of characters than you're using. However, the problem of how to pack characters that fall into the range 0-127 into containers that hold values 0-255 is fairly simple and I think can be left as "an exercise for the reader", as they say.)

For the example above, storing in SOA instead of AOS format reduces the data a lot. I found that for a lot of cases that is a good 'compression' method but it depends on the specific application if SOA is suitable.
Chris
A: 

Gzip (deflate algorithm) is pretty good at compression, although like all good compression algorithms, uses plenty of cpu (3-5x as much as overhead of json reading/writing on my testing).

StaxMan