views:

237

answers:

3

Hello,

I have been asked to publish a complete book online similar way Google Books does? i.e. it's viewable and printable but not download-able.

Is the process is basically "high quality scanning"? are there any open source solution to "mass generation" of "watermark" on those high quality images. Suppose you have an original image. and when the user views it online, I re-create the image add watermark and some other text on top of the image "on-the-fly" are there such library exist in python off course :)

Any tips? If you have done this before please share.

Thanks

+4  A: 

Unfortunately Google uses a patented technique for scanning it's books, so you will probably have to stick to traditional methods.

Google created some seriously nifty infrared camera technology that detects the three-dimensional shape and angle of book pages when the book is placed in the scanner. This information is transmitted to the OCR software, which adjusts for the distortions and allows the OCR software to read text more accurately. No more broken bindings, no more inefficient glass plates.

Basically you will need to scan the book using an OCR application (tesseract is good), then I would generate a PDF/image from the scanned text, and finally add the watermark on top. The Python Imaging Library would seem to be the best tool for this.

James
+1  A: 

Don't know much about Google Books, but Python Imaging Library can do watermarking (there's ASPN recipe for that).

PiotrLegnica
A: 

See the slashdot question on reproducing Google's photo + laser grid technique.

Ewan Todd