views:

519

answers:

3

I want to write a tool that helps me search pdf/chm/djvu files in linux. Any pointers on how to go about it? The major problem is reading/importing data from all these files. Can this be done with C and shell scripting?

A: 

How about a plugin for Beagle ?

It already searches PDFs but you can add other file types.

Here is the relevant wikipedia page : http://en.wikipedia.org/wiki/Beagle_(software)

Pat
+1  A: 

Tracker ships with Ubuntu 8.04 -- it was a significant switch from Beagle which users believed was too resource (CPU) intensive and didn't yield good enough results. It indexes both pdf and chm and according to this bug report it also indexes djvu.

cdleary
+1  A: 

Note that djvu is an image compression format (optimized to compress 'pictures of text', typically the results of scanning). As such, you won't be able to search for text, except in the metadata -this is what the link sent by cdleary refers to-, or if you first use OCR on the document to convert it into text.

The same is true for PDFs which content are scanned articles/books.

OysterD