tags:

views:

160

answers:

3

Is there a library that can open and search through a pdf file? Preferably in C, python or ruby...

A: 

I've looked into using Apache PDFBox for something similar but never ended up using it. That's a Java library, but Java plays well with other languages.

Ryan Lynch
+4  A: 

There are various libraries for extracting text from PDF files. This is a little short of "searching" but that should be easy to do.

For Ruby try PDF::Toolkit.

For Python there's pyPdf:

pdf = pyPdf.PdfFileReader(file(path, "rb"))
content = pdf.getPage(1).extractText()
Mark
A: 

This Ruby gnome library has a sub-library called poppler for rendering pdfs. http://ruby-gnome2.sourceforge.jp/hiki.cgi?Ruby%2FPoppler

It can also extract portions of the pdf as text. It can also find rectangles in the pdf document that contain the text that you search for. These methods are in the "Page" class.

http://ruby-gnome2.sourceforge.jp/hiki.cgi?Poppler%3A%3APage

Hope this helps

Chase M Gray