ansaurus

Question

Answer 1

+5 A:

Why don't you encode each $ in the original file as $$ in the compressed file?

And/or use some other character instead of $ - one that is not used much in bmp files.

Also note that the BMP format has RLE compression 'built-in' - look here, near the bottom of the page - under "Image Data and Compression".

I don't know what you're using your program for, or if it's just for learning, but if you used the "official" bmp method, your compressed images wouldn't need decompression before viewing.

Blorgbeard 2009-03-26 09:40:59

+1 for answering the question and giving an alternative

Mark Pim 2009-03-26 09:53:56

Answer 2

A:

If I understand correctly, the problem is that $ is both a symbol for marking a repeat, and also can be a 'BMP' value as well?

If so, what you could do is to mark a double $ ('$$') character to denote that the '$' character should be treated not as a repeat, but as a single '$'. This would of course mean that the '$' is expensive to encode (takes two symbols instead of 1), but would solve your problem.

If you wanted to have a run of the '$' character, you would need to encode it as: $$$5 - meaning '$' run of '$$'=$, '5' - 5 times.

Andy Wyatt 2009-03-26 09:43:19

Answer 3

+2 A:

AAAAAABBCDEEEEGGHJ$IIIIIIIII ==> $A6$B2CD$E4$G2HJ$$I9

If the repeat character occurs in the data, try inserting an extra repeat character in the encoded data. Then if the decoder sees a double repeat character it can insert the actual repeat character

$A6$B2CD$E4$G2HJ$$I9 ==> AAAAAABBCDEEEEGGHJ$IIIIIIIII

mrwes 2009-03-26 09:44:47

Answer 4

+1 A:

What most programs do to signify that some character needs to be treated literally is that they have a defined escape sequence.

For example, in regular expressions, the following are specially defined characters that usually have a meaning:

^[].*+{}()$

Yes, your fun dollar sign character is in there, and it usually means end of line.

So what a programmer using regular expressions has to do to have these characters interpreted literally is that they need to express those characters as an escape sequence. For example, to interpret $ as $, and not end of line, the programmer uses \$, which is the escape sequence.(1)

In your case, you can store literal dollar signs into your compressed file as \$.(2)

NB: grep inverts this logic.
The above solutions to store $ as $$ becomes confusing when you have runs of $ in the BMP file.

supercheetah 2009-03-26 10:33:25

Answer 5

+1 A:

If you have the luxury of being able to scan the entire input before starting to compress it, you could choose the least frequent value in the input as your escape value. For example, given this input:

AAAABBCCCCDDEEEEEEEFFG

You could choose "G" as your escape value (or even "H" if it's part of your symbol set) and adopt a convention whereby the first character of the encoded stream is the escape value. So the string above might encode to:

GGA4BBGC4DDGE7FFGG

or even better:

HHA4BBHC4DDHE7FFG

Please note that there's no point in encoding a "run" of two identical values because the "compressed" version (e.g. HD2) is longer than the uncompressed version (DD).

Hope that helps!

Anodyne 2009-03-26 10:49:31

ansaurus

tags:

views:

answers:

Implementing run-length encoding

related questions