For a personal project of mine, I'm writing up a small class to compress to and decompress from a rather obscure format. I've got the full spec, but that's not where the problem is.
First, this 'format' uses a set of 6 different compression types as well as uncompressed blocks of byte data. The formats are RLE, an offshoot of RLE where the number increments each byte (e.g. 3, 4, 5, ...), a 16-bit RLE, LZ-Copy, a reverse LZ-copy, and LZ-Copy Xor'd with 255. It's not the cleanest of specs, but I didn't design it either.
My compression routine is supposed to take in an array of anywhere from 1 to 65535 bytes, and (hopefully) compress it as much as possible. My previous attempt at this simply calculated out, starting from any index in the uncompressed stream, which of the compression techniques above will provide the best compression, and then compresses however many bytes that method will compress to the array of compressed bytes before repeating from the new 'uncompressed' index, e.g.:
{0,0,0,1,2,3,4}
The algorithm would first read that there were three zeroes at the start, and then output the RLE encoding for them that the spec used, and then starting from the fourth element, would read that incrementing RLE would cover the '1,2,3,4' well enough and compress that before returning.
The problem summarized is that while trying to find out the best spec to use, the routine is very slow even on small (20-30) byte arrays. Can anyone help with tips on how I might look at optimizing this, or if there's any more information I could provide to help?