views:

80

answers:

1

Hi, I am using ghostscript 8.71 to extract text from the pdf pages. The command i am using is gswin32c -q -sFONTPATH=c:\fonts -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -dSIMPLE
-fps2ascii.ps -dFirstPage=1 -dLastPage=1 input.pdf -dQUIET.

And using stdout writing text to another file. But problem is some searchable text are not extracted by ghostscript.

Some Font text is not extracted ex: Verdana in bold characters but ghostscript is opening the font files.

I can upload the pdf file but here i didnt find any upload option. If any option is available let me know.

Thanks in Advance.

A: 

Did you also try alternative commandline tools to extract the text, such as pdftotext from the XPDF package? How do these compare?

Can you give more details about what exactly is missing in your output? Just certain types of characters, just certain fonts, just certain pages?

Also, you are mixing Linux/Unix syntax ("gs") with Windows syntax ("c:\fonts"). On Windows systems, the default location where fonts are hosted usually is c:\Windows\fonts ...

Oh, and yes: having your problematic PDF file to look at would definitely help.

pipitas
Thanks for your answer.I need to use only GhostScript for text extraction. I have copied all fonts from c:\windows\fonts to c:\fonts which also contains ghost script type1 fonts.
anil
Tell me any option to upload my PDF file.waiting for your response.
anil
There are free upload services on the internet, just google for them.Also, you *should* still at least try `pdftotext` (as well as `pdffonts` and `pdfinfo` from the same package I named -- just to collect more data points about the root of the problme, so we may better know how you could get it to work with Ghostscript.
pipitas
@anil: If you are unwilling to respond to questions, you cannot expect to get useful answers. (??) *Did you also try alternative commandline tools to extract the text, such as pdftotext from the XPDF package? How do these compare?* (??) This is a useful test to do **even** if you "need" to use only Ghostscript for text extraction. It will help to determine if your PDF is faulty or not...
pipitas