Is it possible to search multiple pdf files using the 'grep' command. It doesn't seem to work, how do people search content on multiple pdf files?
Pdf is a binary format, that's why searching it with grep is not that helpful. You can search the strings is a pdf with grep like this:
ls dir_with_pdfs/*.pdf|xargs strings|grep "keyword"
Or you can use the pdf2text command on pdf's and then search result with grep.
Well, PDF is a binary format, and grep can search binary files as if they were text
grep -a
or you can just use pdftotext (which comes with xpdf) like this:
pdftotext whee.pdf | grep pattern
You don't mention which OS you're using, but under Mac OS X you can use mdfind
from the command line:
mdfind -onlyin search/directory/path "kind:pdf search text"
PDF is a binary dump of objects used to display the pages. There may be some meta data you can grep but the actual page text is in a Postscript stream and may be encoded in a variety of ways. Its also not guaranteed to be in any order. You need to think of PDF as more like a Vector image file than a text file.
There is a short article explaining text in PDFs in more detail at http://pdf.jpedal.org/java-pdf-blog/bid/27187/Understanding-the-PDF-file-format-text-streams