views:

124

answers:

2

Hi experts,,,

I have a pdf document with content in Arabic language and when I try to search inside the document for a specific word, adobe reader returns no results.

it seems a format problem... how can I fix that? thanks.

+1  A: 

It might not actually be text, or it might be in a container that Reader doesn't pay attention to. It's especially common to expand text objects into vector shapes when you're dealing with fonts that most people aren't going to have installed on their system. It looks the same on the screen, but it's not searchable.

Azeem.Butt
is there a fix?
Bassel Alkhateeb
Not unless you're the author of the PDF.
Azeem.Butt
+2  A: 

There are at least four different ways to get text into a PDF document (in order or likelihood):

  1. Place the text with standard text operators and standard fonts
  2. Place the text with standard text operators with non-standard fonts
  3. Draw one or more images that represent the text
  4. Place the text by manually drawing the glyphs with various PDF graphics commands

Case 1 is typically searchable. Case 2 is searchable if the font and encoding are sane - if they're not (and this is likely the case for non-Latin fonts) then there is probably no reliable way to map the encoded glyphs back to Unicode (and by the way - PDF is fairly Unicode hostile). Case 3 is totally unsearchable without knowing more about how the PDF was generated. Case 4 is totally unsearchable.

That said, all cases cases be read with an OCR engine that understands Arabic. I understand that the Iris engine does Arabic.

plinth
thanks for the clear answer
Bassel Alkhateeb