views:

100

answers:

5

This is more of a computer science / information theory question than a straightforward programming one, so if anyone knows of a better site to post this, please let me know.

Let's say I have an N-bit piece of data that will be sent redundantly in M messages, where at least M-1 of those messages will be received successfully. I am interested in different ways of encoding the N-bit piece of data in fewer bits per message. (this is similar to RAID but at a much smaller level, where N = 8 or 16 or 32)

Example: suppose N = 16 and M = 4. Then I could use the following algorithm:

1st and 3rd message: send "0" + bits 0-7
2nd and 4th message: send "1" + bits 8-15

If I can guarantee that 3 messages of the 4 will get through, then at least one message from each group will get through. Thus I can make this work with 9 bits or less, there's probably a way to do this with fewer total bits but I'm not sure how.

Are there some simple encoding/decoding algorithms to do this kind of thing? Does this problem have a name? (if I know what it's called, I can google it!)

note: in my particular case, the messages either arrive correctly or do not arrive at all (no messages arrive with errors).

(edit: moved 2nd part to a separate question)

+2  A: 

Parchive does forward error correction with a Reed-Solomon block code. The specifications reference some papers that might be good reading material.

Mark Johnson
is there a version of Reed-Solomon that works with missing messages, rather than bits that have errors?
Jason S
Must be, since Parchive can recover from missing files/blocks as long as you have sufficient parity files/blocks available.
Mark Johnson
+1  A: 

See erasure codes.

Steve Emmerson
thanks! -------
Jason S
+1  A: 

I'm not sure if I understood all the details of your question correctly, but your problem is definitely aboud designing some kind of error correcting code. This is a vast area of computer science and thick tomes have been written about it. Start with wikipedia and see if you can get any simple schemes (like Hamming or Reed-Solomon codes) to work in your case.

If you want to deal not only with symbol corruption, but also deletion of symbols, you should look at erasure codes, this is definitely a more difficult task but good methods exist in many cases.

EDIT: This material from hackersdelight.org seems a nice introduction.

Krystian
+2  A: 

(Incomplete answer follows. I may add more later.)

The term you may be interested in is channel coding: adding redundancy to a source in order to make it robust during transmission over a noisy channel. In information theory, the complementary problem to channel coding is source coding: reducing the redundancy in a source to represent it using fewer bits. (The combination of these two problems is called joint source-channel coding.)

Your first question asks to find a channel code. The simple example you give is called a repetition code, i.e., you send the same message more than twice (usually an odd number of times), and then the message which is received most often is accepted as the original message.

This code is inefficient. To use standard notation, let k = number of bits in original message, and n = number of bits in the transmitted message. For your example, k = 16 and n = 64. A measure of coding efficiency is k/n, where higher means more efficient. In your case, k/n = 0.25. This is low.

The repetition code is a simple kind of block code, i.e., redundancy is added to each block of k bits to create a codeword of n bits. So are the Hamming and Reed-Solomon codes as others mentioned. Hamming codes are relatively easy to understand with some basic linear algebra.

These should be enough terms for you to search on your own. Good luck.

Steve
+1  A: 

You're looking for a packet erasure code. There are only two useful packet erasure codes that are not totally encumbered by patents, and there's only one open-source library to implement those. Find it here: http://planete-bcast.inrialpes.fr/rubrique.php3?id_rubrique=5

Andrew McGregor