views:

281

answers:

9

Imagine you have a channel of communication that is inherently lossy and one-way. That is, there is some inherent noise that is impossible to remove that causes, say, random bits to be toggled. Also imagine that it is one way - you cannot request retransmission.

But you need to send data over it regardless. What techniques can you use to send numbers and text over that channel?

  1. Is it possible to encode numbers so that even with random bit twiddling they can still be interpreted as values close to the original (lossy transmittion)?

  2. Is there a way to send a string of characters (ASCII, say) in a lossless fashion?

This is just for fun. I know you can use morse code or any very low frequency binary communication. I know about parity bits and checksums to detect errors and retrying. I know that you might as well use an analog signal. I'm just curious if there are any interesting computer-sciency techniques to send this stuff over a lossy channel.

+2  A: 
nik
+1  A: 

You can use Reed-Solomon codes.

caf
+1  A: 

See also the Sliding Window Protocol (which is used by TCP).

Although this includes dealing with packets being re-ordered or lost altogether, which was not part of your problem definition.

Andrew Shepherd
+3  A: 

Probably one of the better-known methods is to use Hamming code. It might not be the best way of correcting errors on large scales, but it's incredibly simple to understand.

Ryan Fox
+4  A: 

This question is the subject of coding theory.

starblue
And to be precise, it's about FEC: Forward Error Correction.
MSalters
+8  A: 

Depending on some details that you don't supply about your lossy channel, I would recommend, first using a Gray code to ensure that single-bit errors result in small differences (to cover your desire for loss mitigation in lossy transmission), and then possibly also encoding the resulting stream with some "lossless" (==tries to be loss-less;-) encoding.

Reed-Solomon and variants thereof are particularly good if your noise episodes are prone to occur in small bursts (several bit mistakes within, say, a single byte), which should interoperate well with Gray coding (since multi-bit mistakes are the killers for the "loss mitigation" aspect of Gray, designed to degrade gracefully for single-bit errors on the wire). That's because R-S is intrinsically a block scheme, and multiple errors within one block are basically the same as a single error in it, from R-S's point of view;-).

R-S is particularly awesome if many of the errors are erasures -- to put it simply, an erasure is a symbol that has most probably been mangled in transmission, BUT for which you DO know the crucial fact that it HAS been mangled. The physical layer, depending on how it's designed, can often have hints about that fact, and if there's a way for it to inform the higher layers, that can be of crucial help. Let me explain erasures a bit...:

Say for a simplified example that a 0 is sent as a level of -1 volt and a 1 is send as a level of +1 volt (wrt some reference wave), but there's noise (physical noise can often be well-modeled, ask any competent communication engineer;-); depending on the noise model the decoding might be that anything -0.7 V and down is considered a 0 bit, anything +0.7 V and up is considered a 1 bit, anything in-between is considered an erasure, i.e., the higher layer is told that the bit in question was probably mangled in transmission and should therefore be disregarded. (I sometimes give this as one example of my thesis that sometimes abstractions SHOULD "leak" -- in a controlled and architected way: the Martelli corollary to Spolsky's Law of Leaky Abstractions!-).

A R-S code with any given redundancy ratio can be about twice as effective at correcting erasures (errors the decoder is told about) as it can be at correcting otherwise-unknown errors -- it's also possible to mix both aspects, correcting both some erasures AND some otherwise-unknown errors.

As the cherry on top, custom R-S codes can be (reasonably easily) designed and tailored to reduce the probability of uncorrected errors to below any required threshold θ given a precise model of the physical channel's characteristics in terms of both erasures and undetected errors (including both probability and burstiness).

I wouldn't call this whole area a "computer-sciency" one, actually: back when I graduated (MSEE, 30 years ago), I was mostly trying to avoid "CS" stuff in favor of chip design, system design, advanced radio systems, &c -- yet I was taught this stuff (well, the subset that was already within the realm of practical engineering use;-) pretty well.

And, just to confirm that things haven't changed all that much in one generation: my daughter just got her MS in telecom engineering (strictly focusing on advanced radio systems) -- she can't design just about any serious program, algorithm, or data structure (though she did just fine in the mandatory courses on C and Java, there was absolutely no CS depth in those courses, nor elsewhere in her curriculum -- her daily working language is matlab...!-) -- yet she knows more about information and coding theory than I ever learned, and that's before any PhD level study (she's staying for her PhD, but that hasn't yet begun).

So, I claim these fields are more EE-y than CS-y (though of course the boundaries are ever fuzzy -- witness the fact that after a few years designing chips I ended up as a SW guy more or less by accident, and so did a lot of my contemporaries;-).

Alex Martelli
I fully agree. R-S is what NASA uses for satellite communication to Earth, and when even the speed of light introduces several hour delays, protocols like TCP/IP aren't going to work because you can't save 12+ hours of transmisssions out on the sat. There will be no "computer sciency" answer because this is an issue of error coding and correction (which you touched on in discrete math). CD and DVDs use multiple layers of error correction to correct for scratches (or other "erasures") - and this is a "one way" communication because your CD player can't call up the record company for new bits.
Tangurena
@jug, very good points, thanks! So the gray codes don't help with graceful degradation and (if GD is important) one should go for the big guns of domain-specific encodings such as shown by Mohr et al in http://eprints.kfupm.edu.sa/43182/1/43182.pdf .
Alex Martelli
Coding theory is taught quite a bit in the more networking oriented CS degrees (plenty of them allow you to specialize in one department or another). I took some courses on this subject in the course of a regular old CS degree.
wds
@jug, yep, but for the error-correcting layer I recommended reed-solomon (esp., but not exclusively!, on erasure channels); mohr
Alex Martelli
@jug, yes that was my idea, assuming an erasure channel -- I can set the reed-solomon parameters to correct many errors, but not all; i.e. there will be some probability of a detected but uncorrected error and therefore for a package to have to be deemed as lost, and that's the point where I was hoping for loss mitigation aka graceful degradation, first with my bad idea about gray codes, now with P-E-T -- still a bit fuzzy, for sure, since still app-domain/payload-kind dependent, but I have no better idea for now. Whether this is an erasure channel is an open issue, btw.
Alex Martelli
+2  A: 

Either Turbo Codes or Low-density parity-checking codes for general data, because these come closest to approaching the Shannon limit - see wikipedia.

jug
If noise is near-random, Turbo Codes and LDPC easily get you arbitrarily close to Shannon limit. If you have a little processing power to spare and don't know noise statistical characteristics, there is no need to look any further.
ima
A: 

I would go for some of these suggestions, followed by multiple sendings of the same data. So that way you can hope for different errors to be introduced at different points in the stream, and you may be able to infer the desired number a lot easier.

Noon Silk
+1  A: 

As Alex Martelli says, there's lots of coding theory in the world, but Reed-Solomon codes are definitely a sweet spot. If you actually want to build something, Jim Plank has written a nice tutorial on Reed-Solomon coding. Plank has a professional interest in coding with a lot of practical expertise to back it up.

Norman Ramsey