I've written a program to perform run length encoding. In typical scenario if the text is
AAAAAABBCDEEEEGGHJ
run length encoding will make it
A6B2C1D1E4G2H1J1
but it was adding extra 1 for each non repeating character. Since i'm compressing BMP files with it, i went with an idea of placing a marker "$" to signify the occurance of a repeating character, (assuming that image files have huge amount of repeating text).
So it'd look like
$A6$B2CD$E4$G2HJ
For the current example it's length is the same, but there's a noticable difference for BMP files. Now my problem is in decoding. It so happens some BMP Files have the pattern $<char><num>
i.e. $I9
in the original file, so in the compressed file also i'd contain the same text. $I9
, however upon decoding it'd treat it as a repeating I which repeats 9 times! So it produces wrong output. What i want to know is which symbol can i use to mark the start of a repeating character (run) so that it doesn't conflict with the original source.