views:

399

answers:

2

I have a been trying to do a search feature in a PDF application. I read the Quartz 2d guide in iphone reference library. And so much has been said about the "pdf operators". It's by using them that everything is done, by using call-backs for them.

For info about pdf operators, we should read pdf reference of adobe. But it's very vast. Can anyone give me an idea of what these operators are (OR how to get an idea in studying them) and which of them I will require for my "search a string feature in pdf"?

+1  A: 

I've been searching for the same thing and today I found this post that has some clues:

http://www.random-ideas.net/posts/42

Looks like the operators are "TJ" and "Tj".

Enrique R.
+1  A: 

Don't be scared off by the PDF reference. Its very well laid out and you really only need to read a few chapters to understand how text is handled. You can download it from Adobe:

http://www.google.com/url?sa=t&source=web&ct=res&cd=1&ved=0CAkQFjAA&url=http%3A%2F%2Fpartners.adobe.com%2Fpublic%2Fdeveloper%2Fpdf%2Findex_reference.html&ei=IQbWS9ayKYP-sgPPqe3PCQ&usg=AFQjCNGBdGYfhwNwc6mdS8wsiOW5Ohr29A&sig2=tUNFSgNPeFW6CrWsiVi1iA

Enrique is correct in that TJ and Tj are the operators that show text, but it is entirely possible, and even normal, for words and sentences to be split up across multiple operations. You should probably concentrate on text blocks, marked by BT and ET (begin text / end text) in the PDF Stream Object.

PDFBox from the Apache Project is a very full featured library for working with PDF documents, have a look there.

purecharger