Is there a library that can open and search through a pdf file? Preferably in C, python or ruby...
A:
I've looked into using Apache PDFBox for something similar but never ended up using it. That's a Java library, but Java plays well with other languages.
Ryan Lynch
2009-11-11 02:20:36
+4
A:
There are various libraries for extracting text from PDF files. This is a little short of "searching" but that should be easy to do.
For Ruby try PDF::Toolkit.
For Python there's pyPdf:
pdf = pyPdf.PdfFileReader(file(path, "rb"))
content = pdf.getPage(1).extractText()
Mark
2009-11-11 02:24:57
A:
This Ruby gnome library has a sub-library called poppler for rendering pdfs. http://ruby-gnome2.sourceforge.jp/hiki.cgi?Ruby%2FPoppler
It can also extract portions of the pdf as text. It can also find rectangles in the pdf document that contain the text that you search for. These methods are in the "Page" class.
http://ruby-gnome2.sourceforge.jp/hiki.cgi?Poppler%3A%3APage
Hope this helps
Chase M Gray
2009-11-11 05:29:08