views:

58

answers:

1

I have to digitize a few thousand physical documents and assign them to a variety of categories for a web app where they will be displayed.

Should I generate bar codes for each of the documents to uniquely identify them?

If so, how can I avoid the barcode being present in the scanned image?

Any other recommendations for approaching this?

+3  A: 

Hi,

Yes, I work with this stuff every day and barcodes are definately the way to go.

I would recommend starting with a 2D bardode such as DataMatrix or PDF417:

  1. When you eventually need to start adding extra data into the barcode (which I'm sure you will start to do once you see what it can do for you) you won't have resistance from clients complaining that they don't like the 'ugly' new format.

  2. You can store whatever data you want in it without worrying about available space or legal characters.

  3. The built-in reduncancy is really useful for handling printing on a wide range of printers and then scanning them back in.

In our barcodes we use a standard key-value pair structure, so that no matter which system is generating or reading the barcode, it will always have the data it needs. This works much better than having a document id and associated lookup tables, and much better than fixed length barcodes.

e.g.

      CLIENTID=123442 CAMPAIGN=WINTER09

Some systems may not care about Campaign, but every system knows what a ClientID is.

I recommend that you don't worry about the barcode being present in the scanned image - your clients will get used to it quickly, and it keeps the document alive and usable. For example, if you email the document somewhere and they mail it back you will still be able to identify it and match it up again. The barcode will become the most important part of the document.

Make sure the barcode is big enough to be scanned at 200DPI and still be usable.

Michael Rodrigues
Agree -- would recommend QR codes though as a format, if there are no requirements to use something else.
Sean Owen