views:

25

answers:

1

Most PDFs do contain lots of binary looking parts in between some ASCII. But I remember also having seen PDFs where such binary parts by and large were absent, and one could open them in a text editor to study their structure.

Is there a trick, tool, or command that will convert binary PDF parts to ASCII/ANSI? (Preferably "free as in beer" or even "free as in liberty")

A: 

Ghostscript has a small utility program written in PostScript in its source code repository. It's called pdfinflt.ps. If you are lucky, it may already slumber in a 'toolbin' subdirectory of your Ghostscript installation location. Otherwise, g et it here:

Now run it together with your targetted PDF through the Ghostscript interpreter:

gswin32c.exe -- c:/path/to/pdfinflt.ps your-input.pdf deflated-output.pdf

pdfinflt.ps will (try to) expand all 'streams' contained in the PDF which use the following compression filters/methods: /FlateDecode, /LZWDecode, /ASCII85Decode, /ASCIIHexDecode.

It will not attempt to remove /RunLengthDecode, /CCITTFaxDecode, /DCTDecode, /JBIG2Decode and /JPXDecode. (Compressed/binary fonts will also pass unchanged into the output PDF.)

If you are in an adventurous mood, you may dare to uncomment those lines in the utility which disable /RunLengthDecode, /DCTDecode and CCITTFaxDecode and see if it still works...

That's the best I can offer right now.

pipitas
@pipitas: Thank you ... this works for me at least in parts. I can now better poke at all these *obj 1 0 R* parts... trying to understand that stuff better.