views:

133

answers:

4

I recently learned about PDF417 barcodes and I was astonished that I can still read the barcode after I ripped it in half and scanned only a fragment of the original label.

How can the barcode decoding be that robust? Which (types of) algorithms are used during encoding and decoding?

EDIT: I understand the general philosophy of introducing redundancy to create robustness, but I'm interested in more details, i.e. how this is done with PDF417.

+1  A: 

I don't know the PDF417. I know that QR codes use Reed Solomon correction. It is an oversampling technique. To get the concept: suppose you have a polynomial in the power of 6. Technically, you need seven points to describe this polynomial uniquely, so you can perfectly transmit the information about the whole polynomial with just seven points. However, if one of these seven is corrupted, you miss the information whole. To work around this issue, you extract a larger number of points out of the polynomial, and write them down. As long as you have at least seven out of the bunch, it will be enough to reconstruct your original information.

In other words, you trade space for robustness, by introducing more and more redundancy. Nothing new here.

Stefano Borini
How can I represent string data using a polynomial? Is there some generic polynomial which can be "parameterized" using some byte values?
DR
A: 

the pdf417 format allows for varying levels of duplication/redundancy in its content. the level of redundancy used will affect how much of the barcode can be obscured or removed while still leaving the contents readable

Matt Lacey
A: 

I do not think the concept of trade off between space and robustness is any different here as anywhere else. Think RAID, let's say RAID 5 - you can yank a disk out of the array and the data is still available. The price? - an extra disk. Or in terms of the barcode - extra space the label occupies

mfeingold
It's quite different. PDF417 is still readable if x% of the code are missing. No matter which x% are missing. RAID5 can't do that: Imagine the first 1 MB on each of your RAID drives is corrupted - RAID can't recover from that, the data is lost.
nikie
@nikie I disagree. RAID is no different - of course there are limits to how much loss it can recover from - what if you lost all of your disks? but the same is true for PDF - what if x% = 100%? at the same time there is no 'special' disk you can pull any - but only one.Regardlless - the way both work is that the entire data sequence is broken into groups and every group is injected with some redundancy. The more redundancy the more damage it can take without data loss. If RAID5 is not enough - go for RAID6
mfeingold
Sorry for the imprecise formulation: PDF417 can recover from up to x% data loss, for some x>0 that depends on the chosen level of redundancy in the code, *no matter which subset of the data is lost*. RAID can never do that, no matter how many drives you use. If the first 1% on every drive is lost (that is 1% of total data is lost), that data cannot be recovered. Even if you use RAID0 with 20 drives.
nikie
+2  A: 

PDF417 does not use anything. It's a specification of encoding of data.

I think there is a confusion between the barcode format and the data it conveys.

The various barcode formats (PDF417, Aztec, DataMatrix) specify a way to encode data, be it numerical, alphabetic or binary... the exact content though is left unspecified.

From what I have seen, Reed-Solomon is often the algorithm used for redundancy. The exact level of redundancy is up to you with this algorithm and there are libraries at least in Java and C from what I've been dealing with.

Now, it is up to you to specify what the exact content of your barcode should be, including the algorithm used for redundancy and the parameters used by this algorithm. And of course you'll need to work hand in hand with those who are going to decode it :)

Note: QR seems slightly different, with explicit zones for redundancy data.

Matthieu M.