views:

497

answers:

3

We have been reading and writing Sticky Notes/Annotations/Comments to pdfs via an activex control in our application for a number of years. We have recently upgraded to Delphi2009 with Unicode Support. The following is causing problems.

When we call

CAcroPDAnnot.GetContents

The results seem to be rather strange and we lose our Unicode Chars. It is not like saving as an ansi string which would usually result in returning ????? instead we get a string such as

‚ɍs‚­“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç

For a string of Japanese characters.

However if I save the comments in the pdf to a datafile via the menu in the pdf itself it is written to file as something like

0kˆL0Oeå0k˜¨ª0’0r0D0_0‰

The latter can be export and reimported into an acrobat pdf and will recreate the correct unicode characters. However once I call CAcroPDAnnot.GetContents in my code it is coming back as something else.

  1. Is CAcroPDAnnot.GetContents broken?
  2. Is there an encoding scheme I should be aware of?
  3. Is there an alternative I might be able to do?

Thanks

A: 

Ok, one of the main differences between Delphi 2009 and the earlier versions is that the default string type is an unicode string. That means that if you use the same ActiveX component as in previous versions, you are passing unicode strings to ascii strings and that is usually not a good idea.

There are a couple of solutions for this problem:

  • Try if you can upgrade your activeX component so that it supports full unicode strings.
  • Use AnsiString and not string to communicate with the activeX component. In this case, you can still use the old interface, but you are still bound to the same limitations.
  • Use an other control that creates pdf. There is a lot to find, but be prepared to change a big chunk of your software. (Some controls are XML based and use encoding. )
Gamecat
+1  A: 

You're not exactly giving us a lot of information to work with.

I take it you're talking about the "Acrobat.CAcroPDAnnot" class' method GetContents here. Which version of Acrobat are you using? Have you perhaps switched versions (or run an update) around the time you started programming with Delphi 2009?

Then: how did you instantiate the object? If using a *_TLB.pas file generated from the DLL, are you certain it still matches it? (Try re-generating it, if uncertain).

Third: how are you calling the method? What type of variable are you assigning the result to?

What might also help, is if you could provide a sample of an annotation (preferably including non-ASCII chars); and for that annotation:

  • what it should look like (and what it does look like inside Reader)
  • what it returns when using a pre-2009 version of Delphi*
  • what it returns when using Delphi 2009*

(* preferably the HEX byte codes of the (ansi/wide)strings; but output from the Ctrl-F7 inspector should do)

Then maybe someone could provide a more meaningful answer.

Martijn
+2  A: 

‚ɍs‚­“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç

That's the string:

に行く日に風邪をひいたら

in CP-932 aka Shift-JIS encoding, an awful but lamentably still-popular encoding in Japan.

You're currently interpreting it in as CP-1252 (Windows Western European). If your PDF-reading component won't convert it for you automatically, you'll need to find a way to detect what encoding the document is in and convert it manually.

I don't know what Delphi provides for reading encodings, but have you got the encodings for Shift-JIS installed in Windows, from the Control Panel -> Regional Options -> "Install files for East Asian languages" option? If not, that might explain why it'd be failing to convert automatically, perhaps.

bobince
Thank you for your help.
Toby Allen