ansaurus

Question

Answer 1

+3 A:

I'd guess that no general purpose algorithm will give you great compression for this kind of data.

Your best bet is to analyze the structure of your data and try to find a custom compression algorithm or possibly customize an existing one (maybe with a pre-filled dictionary or something like that).

Joachim Sauer 2008-12-22 16:06:58

Answer 2

A:

The zlib compression (maybe the same algorithm as gzip) is free. It has a few settings, but I am not sure how much you can save, unless there is some periodic pattern to your bits.

Since the png and gif graphics files are essentially representations of bit patterns, perhaps you can find the compression algorithm they use.

2008-12-22 16:17:17

Dude, GIF and PNG are lossy. You don't want to use them for anything except images.

T.E.D. 2008-12-22 16:46:38

I think you are thinking of jpeg compression. I'm pretty sure that png and gif are lossless.

KeithB 2008-12-22 16:50:58

gif and png aren't lossy!

Nils Pipenbrinck 2008-12-22 16:55:18

I looked it up. PNG is lossless, and GIF is lossless for 256 colors or less.

2008-12-24 08:47:21

PNG has a lossless and a lossy version actually. Still, they are designed for images, not general data.

T.E.D. 2008-12-24 14:54:34

BTW, the Deflate algorithm in PNG is used in .zip files.

devstuff 2008-12-28 02:29:09

Answer 3

+2 A:

Since the streams are so small, can you post some of them here?

Also are you sure that there is enough redundancy in those streams to even allow compression? Are there any repeating blocks of data?

It's kind of a longshot, but in the absence of any concrete answers, you might want to look into the ROM scene and check out how strings of text were compressed in cartridge-based RPG games like "Chrono Trigger" or "Final Fantasy III." I know that the text strings were compressed in those games (bytes were so precious in those days) and unraveling the scheme proved a fun challenge for hackers. That's the only thing that came to my mind when you mentioned lots of short little strings being compressed.

Your root problem might remain, though. I would imagine that the compression schemes in those ROMs exploited redundancy across many strings (ie, if "Timbuktu" occurred in 58 different strings) and not so much within a single stream.

John Booty 2008-12-22 16:17:46

Answer 4

A:

What you want is lossless binary compression. I am sure there are papers or web articles if not tons of other resources out there. Google those terms and i suspect you will get what you need.

How much data are you talking about? Is your pipe small or the throughput so high that you have to compress?

In retrospect, your data is so small that you are probably not going to get worthwhile gains unless you analyze your traffic and do your own "compression" which is basically just a mapping/hash of known bit patterns.

as someone else said, post some sample data and there is probably better advice after that.

Tim 2008-12-22 16:18:14

Answer 5

+7 A:

What are you hoping to accomplish by compressing 150 bits? Unless you aggregate several of this 19b messages, I'm not sure what you hope to gain. Is it a UI issue--wherein you want your users to send/receive "codes"?

How about base 64 encoding? This will take binary data and turn it into coded characters for easy transmission or entry.

Michael Haren 2008-12-22 16:21:42

+1 for base64 - which I think is what he really needs.

Draemon 2008-12-22 17:50:34

Answer 6

A:

I've had the same thought as Tim - such a small amount of data barely seems worth compressing. As a matter of fact, I'd suggest that what you really want to look into is some sort of ascii encoding method, like uuencode or mime-encode (aka "Base64").

Paul Tomblin 2008-12-22 16:22:02

Answer 7

A:

CCITT's Group 3 and Group 4 lossless encoding schemes, used in compressing G3 and G4 TIFFs, were designed with binary data in mind. G4 TIFFs are black and white images usually used for OCR-ing and faxes. Another simple scheme that comes to mind is RLE.

codelogic 2008-12-22 16:29:59

Answer 8

+1 A:

I would suggest you look into using zlib. It is downloadable, and the license allows you to use it for pretty much any project. An important point is that it is widely used, and thus well debugged. If your data is important, you don't want to have to debug odd edge cases in a hombrew algorithm at random dates in the future.

I've messed around with it a bit, and it does allow a stream-oriented compression. I'm not real sure how good it is when you are just feeding it a small amount of data at a time though. Loss-less compression tends to work by finding and eliminating patterns, and there won't be a lot of patterns to find if you are feeding it something small like 12 bytes at a time.

I'm not voing Juan's answer up because he also suggests using GIF which is a lossy compression. You didn't give a lot of info, but I'm guessing you don't want any compression format that actually looses data. Most popular graphic, audio, and video compression algrithms are lossy; they rely on the ability of human senses to take in an image or sound properly with some of the original information removed or modified slightly.

T.E.D. 2008-12-22 16:42:52

GIF is lossless for 256 colors or less.

2008-12-24 08:49:04

Answer 9

+3 A:

Chris, thanks for posting those samples. I think run-length encoding is the way you want to go. That should be pretty trivial to implement.

http://en.wikipedia.org/wiki/Run-length_encoding

Will work well with all those consecutive 0's.

So the primary reason to compress these strings is to make them easier to cut and paste? Makes sense; that sounds like an interesting project.

If you are just trying to make the strings more human-manageable it sounds like you're all set. If you are trying to compress them so that they transmit faster over the wire I think the benefit of compressing small strings may be defeated by other TCP issues like MTU sizes and all that. (I'm not experienced there, so take that last bit with many grains of salt)

Good luck!

John Booty 2008-12-22 17:42:06

Answer 10

A:

JBIG might give you what you need.

http://en.wikipedia.org/wiki/JBIG

http://www.jpeg.org/jbig/index.html

http://www.cl.cam.ac.uk/~mgk25/jbigkit/

JBIG is used to compress 1-bpp fax images.

David Poole 2008-12-22 17:46:44

Answer 11

A:

Just to add to what's already been said, isn't "compressing a small amount of data" intrinsically a bit pointless? If you could elaborate on the data, the platform or the uses that might help.

As for the bits vs ascii - I'm not entirely sure what you're getting at, but as mentioned by Michael, Base64 provides a way to make arbitrary binary more friendly.

Note that any conversion from binary into ascii is the opposite of compression.

Draemon 2008-12-22 17:53:45

Answer 12

A:

My first suggestion is that you look into range encoding. Instead of

1: compressing from bit data into binary data and then

2: encoding binary data into base64 ASCII data,

you could pack your bits directly into the range 0-N (where N is the number of printable characters you are using minus 1) and then do a dead-easy mapping.

My second suggestion is that you study the filter methods employed by PNG and think about whether similar methods could be used to render your data more compressible. It's difficult to tell from just two sample layout grids, but it seems very likely from your first grid that some method such as "predict each pixel based on its neighbors above and to the left, and then convert each pixel to 0 if it met its prediction and 1 if it defied its prediction" could give you a much more uniform set of data, and thus greater compression.

2008-12-28 02:23:55

ansaurus

tags:

views:

answers:

Compressing a small amount of data

related questions