tags:

views:

235

answers:

3

Hi all,

I'm trying to find a way to search inside PDF files. I came accross the PHP PDF class but I can't seem to find any function for reading/searching a filestream.

So, as naive as I am, i tried to simple get a stream using file_get_contents(), obviously it's an encrypted-like output ;)

So my question, is there any way to search through PDF files? I'm looking for script-only / free / open source solutions and not buying some expensive commercial libraray.

+3  A: 

Try this article by David Walsh

jeerose
Thanks for your quick reply, I'l read and try it! I'll keep you posted.
Ben Fransen
+1  A: 

A PHP search engine called Sphider has the option of adding PDF search via XPDF. You can then customise the result templates to fit in with the rest of your site (if applicable).

akamike
This option still requires other libraries to be installed. "Download and install pdftotext and catdoc and set there location(path) in conf.php"
jeerose
catdoc is only needed for MS-Office files, pdftotext is part of XPDF as I noted and is mentioned in the FAQ, "Indexing pdf and doc files".
akamike
Thanks for your answer, gave you +1 for your effort but its not something I'm looking for. Thanks.
Ben Fransen
+1  A: 

XPDF?

There is a blog post here that may be of help.

There seems to be some code here that could help - a simple class that reads a PDF into plaintext. Unsure if it supports decryption.

There are also a number of resources in PHP documentation that may help you. Click.

FPDF and FPDI may also help. Probably your best bet after some research.**

Daniel May
Thanks I will check that out tomorrow! +1
Ben Fransen
I've gone through all your links and found out that the the mentioned class is not supporting encryption. So XPDF seems to be left over, since I'm working on a windowsmachine and having xampp installed I put all the files in the x:/xampp/apache/bin/xpdf/ directory. But I'm unable to execute the command mentioned in the blogpost you've send. Any suggestions on how to properly setup XPDF? (I don't know that much about webservers... did I even put the files in the right directory?)
Ben Fransen