ansaurus

Question

How to extract text using Zend_Pdf from pdf page

Answer 1

A:

From the manual it doesn't appear that this functionality is supported. Also, new text is written using the drawText() function, which appears to write images, not plain "decodable" text.

Andy 2010-03-22 16:03:38

It does write 'text' rather than images but you're certainly correct, at the moment parts of a PDF can't be extracted or modified.

David Caunt 2010-03-22 22:11:32

Answer 2

+1 A:

I agree with Andy that this does not appear to be supported. As an alternative, take a look at Shaun Farrell's solution to extracting text from a PDF for use with Zend_Search_Lucene. He uses XPDF, which might also meet your needs.

Cal Jacobson 2010-03-22 21:02:47

xpdf will extract the text from PDFs, as long as your PDFs actually contain text of course (as opposed to scanned images). On the other hand, you might try the following as well : http://www.webcheatsheet.com/php/reading_clean_text_from_pdf.php.

wimvds 2010-03-26 12:28:11

ansaurus

tags:

views:

answers:

How to extract text using Zend_Pdf from pdf page

related questions