ansaurus

Question

Determining best compression algorithm to use for a series of bytes

Answer 1

A:

It sounds like what you're trying to do is work out a large number of compression possibilities for every possible segment (let's call your variable length 1-64K blocks segments) of the file. Correct me if I'm wrong, but are you working out the best compression for the first segment from the following choices (method 0 is uncompressed):

compression method 0, length 1 byte.
compression method 1, length 1 byte.
: : : : :
compression method 6, length 1 byte.
compression method 0, length 2 bytes.
compression method 1, length 2 bytes.
: : : : :
compression method 6, length 65534 bytes.
compression method 0, length 65535 bytes.
compression method 1, length 65535 bytes.
compression method 2, length 65535 bytes.
compression method 3, length 65535 bytes.
compression method 4, length 65535 bytes.
compression method 5, length 65535 bytes.
compression method 6, length 65535 bytes.

That's going to take a huge amount of time (roughly 420,000 compression attempts per segment). If that is what you're doing, you'll be better off choosing a single segment size (e.g., 64K) and applying each of the seven compression methods to it to choose the best. Then, for each segment, output the "method" byte followed by the compressed data.

paxdiablo 2009-03-03 06:52:03

No, not quite like that. I look up basically to see how many bytes each compression type can compress starting from a given index in the uncompressed bytes, and then pick the one which gives the most compressed bytes, before starting again from the new source index.

Sukasa 2009-03-03 14:07:18

compression method 0, length 1 byte, index 0....compression method 7, length 1 byte, index 0.compression method 0, length 1 byte, index 8....compression method 7, length 1 byte, index 8.compression method 0, length 1 byte, index 36....compression method 7, length 1 byte, index 36.

Sukasa 2009-03-03 14:08:16

Note that this is probably an NP-complete problem, since changing the source index of attempt nr. 2 might produce different results. What I mean is, that even if you pick a sub-optimal first method, the rest of the compression might give a better result overall.

Lasse V. Karlsen 2009-03-22 22:15:22

ansaurus

tags:

views:

answers:

Determining best compression algorithm to use for a series of bytes

related questions