views:

450

answers:

2

I have a collection of ebooks in djvu, pdf, chm format and I am looking for a way to search the keyword in the content. I have been researching around and find couple suggestion to parse pdf content but there seems to be no way to convert the content in djvu into text. By any chance, does anyone know a way to decode djvu content into text so that I can search it easily?

Thanks

A: 

python-djvulibre is a set of Python bindings to the djvulibre open source implementation of djvu -- I haven't tried it, but it looks like it should meet your needs.

Alex Martelli
A: 

Certainly the DjVuLibre SDK will allow access to the text layer -- if it exists (not all DjVu files have a text layer; many are purely raster images).

An alternative solution might be to base your index on IIS technology. CamiNova has a free IFilter that you can use for this.

[http://dev.caminova.jp/beta/djvu-wic/%5D%5B1%5D

msr