tags:

views:

269

answers:

3

I have pdf -files which contents I have not managed to search by any terminal program. I can only search them by Acrobat Reader and Skim.

How can you search contents of pdf -files in terminal?

It seems that a better question is

How is the search done in the pdf viewers such as Acrobat Reader and Skim?

Perhaps, I need to make such a search tool if no such tools exist.

+1  A: 

PDF files are usually compressed. PDF viewers such as Acrobat Reader and Skim search the contents by decompressing the PDF text into memory, and then searching that text. If you want to search from the command line, one possible suggestion is to use pdftk to decompress the PDF, and then use grep (or your favorite command line text searching utility) to find the desired text. For example:

# Search for the text "text_to_search_for", and print out 3 lines of context
# above and below each match
pdftk mydoc.pdf output - uncompress | grep -C3 text_to_search_for
Adam Rosenfield
+2  A: 

Try installing xpdf from MacPorts; it is supposed to come with a tool called pdftotext which should then allow you to search using grep.

Brian Campbell
+1  A: 

pdftotext is indeed an excellent tool, but it produces very long lines; in order to grep you will want to break them up, e.g.,

pdftotext drscheme.pdf - | fmt | grep -i spidey
Norman Ramsey