Array Compression Algorithm

This post's a community wiki. I don't want any points for this -- I've already voted to close the question.

The number of bytes to compress has very little to do with choice of compression algorithm, although it does affect the implementation. For example, when you have fewer than 2^15 bytes to compress, if you are using ZLib, you will want to specify a compression-level of less than 15. The compression-level in Zlib (one of the two such parameters) controls the depth of the "look-back" dictionary. If your file is shorter than 16k bytes, then a 32k look-back dictionary will never half-fill; in that case, use one less bit of pointer into the look-back for a 1/15th edge on the compression compared to setting ZLib to "max."

The content of the data is what matters. If you are sending images with mostly background, then you might want Run Length Encoding (used by Windows .BMP, for example).

If you are sending mostly English text, than you wish you could use something like ZLib, which implements Huffman encoding and LZW-style look-back dictionary compression.

If your data has been encrypted, then attempting to compress it will not succeed.

If your data is a particular type of signal, and you can tolerate some loss of detail, then you may want to transform it into frequency space and send only the principal components. (e.g., JPEG, MP3)

-1 you have no idea what his data is.

Heath Hunnicutt 2010-08-08 16:37:50

+1, zlib doesn't care what his data is either.

Hans Passant 2010-08-08 16:42:18

@SigTerm -- Oh, it's unsigned char. I guess that means the contents of the unsigned char array is compressible by Zlib. Gosh, I had no idea.

Heath Hunnicutt 2010-08-08 16:44:46

@Hans - Oh, yes, it does. Use zlib ever?

Heath Hunnicutt 2010-08-08 16:45:01

@Heath Hunnicutt: Dude, I really recommend spriting/punching bag. Way more exciting.

SigTerm 2010-08-08 16:48:16

@SigTerm: What with your username, I would expect you to tell me to "handle myself" for stress relief. :) By the way, in 1960, US government psychiatrist Eric Berne found that people who sublimate via violence are prone to homicide. So I'll pass on the punching bag.

Heath Hunnicutt 2010-08-08 17:12:08

-1 I have to agree with Heath. That's not a response as the question was incomplete, to put it mildly. Maybe the OP shouldn't even consider compression, it's probably the wrong approach. If it's sparsly populated array maybe a list or a tree would be a better data structure, if it's "random" data a generic compressor will not be able to compress anyway or anything else. The question lacks info to give a meaningful answer.

tristopia 2010-08-08 20:11:48

@tristopia -- Careful, man, you don't want to be on the wrong side of the SO popularity contest by accident. ;)

Heath Hunnicutt 2010-08-08 20:28:17

I don't really care about popularity and saying to someone that his question is bad and explaining why is the right thing to do. Giving any answer that has a good chance of being inadequate is wrong. People should introspect why they're contributing to SO, to exchange know-how or to participate in a juvenile pissing conquest.

tristopia 2010-08-08 20:39:16

@tristopia - I went back to all your answers and I saw they are all good answers. So there you have 65 up votes from me. Many of your answers are seriously under-voted, too. I did not spread out the voting, not trying to abuse the system, so you only get 200 rep.

Heath Hunnicutt 2010-08-09 05:29:16

Thank you that's nice, but apparently someone just undid that as I'm back to 809 reps.

tristopia 2010-08-09 07:23:25

@Heath I would have thought that people who used punching bags are LESS prone to homicide, seeming as they are better practised in self defence?

Tom Gullen 2010-08-09 13:10:07

@Tom - Sublimators are prone to *commit* homicide, not fall victim to it. The "Tough Guy" mentality is document in Eric Berne, Games People Play.

Heath Hunnicutt 2010-08-09 17:12:50

Was just kidding :)

Tom Gullen 2010-08-09 23:11:56

So you vote to close, and then submit an answer? Confusing at best Heath.

Josh K 2010-08-08 17:00:02

@Josh -- My answer is essentially an explanation of *why* I voted to close. If you read my answer, you'll see that it is very vague and general. As any answer to such a question needs to be. The OP didn't even specify if he wanted lossless, although that's a fair assumption...

Heath Hunnicutt 2010-08-08 17:09:05

@Heath: Then this should be a comment because it is **not** an answer (by my judgment and your admission).

Josh K 2010-08-08 17:14:33

@Josh - No, I did not say it's not an answer. It is an answer. And if you read it, you may learn something. But it is not *only* an answer to the question. It is an answer to the meta-question of "what the heck am [OP] talking about?" So, no, not by my own admission. Only by your "judgment," which is a great admission on your part. Also your judgment is clearly biased by your opinion of me. Objectively, the text above *does* answer OP. So take your judgment and self-handle it.

Heath Hunnicutt 2010-08-08 19:38:39

@Heath: "Very vague and general" answers are discouraged. Answers which the author describes as "essentially an explanation of why I voted to close" are also discouraged. If you feel this could possibly be a meta answer **put it on meta, not SO**.

Josh K 2010-08-08 19:45:06

@Josh: Look at the two answers here. Which of the two answers is more vague and general? Can you be serious? As for meta, that is for discussion of SO, not discussion of SO questions. Surely, you already know that? Being disingenuous or something?

Heath Hunnicutt 2010-08-08 20:27:17

This is actually a far better answer than the other one - the key part being *"The number of bytes to compress has very little to do with choice of compression algorithm ... The content of the data is what matters."* . In a sense there *are* no general purpose compression algorithms - just expansion algorithms that have interesting failure cases.

caf 2010-08-09 01:06:00

@caf. Thanks for that.

Heath Hunnicutt 2010-08-09 05:25:34

@Josh: I do realize I criticized my own answer as "very vague and general." ANY answer to this poorly-posed question must be "very vague and general." THAT is why I voted close. But if SO insists this question must live on in the corpus for reference, at least the answer should say *something* useful.

Heath Hunnicutt 2010-08-09 05:27:04

ansaurus

tags:

views:

answers:

Array Compression Algorithm

related questions