views:

25

answers:

1

I am working on a project to convert OCR'd PDf to png using ImageMagick and ghostscript and display in the browser so that i can select words in the image by letting a user query for the word . Imagemagick works fine along with ghostscript .

I have a problem with the ps2text utility where it does not work reliably with pdf's . could anybody suggest a good utility to convert postscript to text in Linux so that i can store it in a db . thereafter i use a custom written search class to find out the co-ordinates of each word and highlight the text in the browser .

Thanks

A: 

For postscript, you should use ps2text. For PDFs, you can pdftotext.

Matias Valdenegro