ansaurus

Question

pdfmark for docinfo metadata in pdf is not accepting accented characters in Keywords or Subject

Answer 1

+1 A:

So, you're supposed to be able to use an ANSI encoded file and any characters which are in the PDFDocEncoding set (which the French accented characters are), but that doesn't work.

Another method is to still use a latin-1 encoded file, but put Unicode characters in octal form (2 bytes: \xxx\xxx). And start the string with the BOM : \377\366

So the above subject string "mot accenté" has to be translated to:

/Subject (\377\376\155\000\157\000\164\000\040\000\141\000\143\000\143\000\145\000\156\000\164\000\351\000)

This works, but it sucks. Anyone have anything better?

rpilkey 2010-06-14 18:54:41

see my comment in the accepted answer for non-octal version..

rpilkey 2010-06-17 19:00:02

@rpilkey: IIRC, it should be enough to just encode the accented character alone, keeping the rest in clear text ASCII. Like this: `/Subject (mot accent\351)`.

pipitas 2010-08-16 21:17:34

Your solution works for the Title and Author fields, but not for the Subject and Keyword fields. This is with Adobe's Distiller 9.3.3177.

rpilkey 2010-08-19 12:37:59

Answer 2

+2 A:

Can you try using UTF16-BE for the encoding and starting the strings with 254 and 255 (thorn and y-dieresis)?

plinth 2010-06-17 17:21:23

I tried opening the .ps file in Notepad++, go to Encoding , Convert to UCS-2 Big Endian, then save. It added the BOM at the beginning of the file and doubled its size, so I think it worked. Distiller errors out with: %%[ Error: undefined; OffendingCommand: þÿ ]%% %%[ Flushing: rest of job (to end-of-file) will be ignored ]%% %%[ Warning: PostScript error. No PDF file produced. ] %%So Distiller won't even look at a UCS-2 file here.This is on Windows XP by the way, if that makes a difference.

rpilkey 2010-06-17 18:16:19

You don't want to convert the whole file to utf16-be, only the strings, so your strings should be /Subject (þÿ...) etc.

plinth 2010-06-17 18:35:03

Thanks. That works. The string that works for my example is: /Subject (þÿ^@m^@o^@t^@ ^@a^@c^@c^@e^@n^@t^@é)where "^@" is the nul byte. (that's how it's displayed in Vim)Putting this into ascii files will be a chore, but it's doable. I don't know why those two fields require this but "Title" and "Author" don't.

rpilkey 2010-06-17 18:59:24

To type in the nul byte in Vim, Ctrl-V u 0000

rpilkey 2010-06-17 19:06:14

Answer 3

A:

You do not need to escape/encode ALL the accented characters!

It is enough to keep the standard ASCII characters and just mix in the \NNN notation where a special character should appear.

The following Ghostscript command creates a two page PDF. It will have nearly empty pages, with 2 bookmarks/outlines included, plus the metadata with accents. Example is for Windows, on Unix/Linux just use gs and change the line end escapes to \:

gswin32c.exe ^
 -sDEVICE=pdfwrite ^
 -o 2-empty-pages-with-bookmarks-and-accents-in-metadata.pdf ^
 -c "[/Creator(brains&smarts)/Author(pipitas)/Subject(m\350t accent\351)/Title(mot accent\352)/Keywords(ganz sch\353\353 bl\353\353\d!)/DOCINFO pdfmark" ^
 -c "[/Page 1 /View [/XYZ null null null] /Title (Page One) /OUT pdfmark" ^
 -c "[/Page 2 /View [/XYZ null null null] /Title (Page Two) /OUT pdfmark" ^
 -c "200 500 moveto /Helvetica findfont 100 scalefont setfont (One) show showpage 200 500 moveto (Two) show showpage quit"
  .

I hope this finally settles your question "Does anybody know of a workaround to get the accented characters into ALL the metadata fields when modifying a postscript file?".

pipitas 2010-08-16 21:54:59

OMG! You're doing all this in a single commandline! Wow...

2010-08-16 22:38:39

Your solution works for the Title and Author fields, but not for the Subject and Keyword fields. This is with Adobe's Distiller 9.3.3177.

rpilkey 2010-08-19 12:37:34

@rpilkey: it works for me for Subject and Keyword fields as without an obvious problem. Adobe Reader 9.3.3.

pipitas 2010-08-19 14:06:50

Ah, but which distiller? You seem to be using Ghostscript, so it might be a bug in Adobe's distiller.

rpilkey 2010-08-19 23:01:56

@rpilkey: Yes, my given commandline uses Ghostscript, and in the paragraph above I said: *"The following Ghostscript command creates a two page PDF."*

pipitas 2010-08-20 08:40:53

ansaurus

tags:

views:

answers:

pdfmark for docinfo metadata in pdf is not accepting accented characters in Keywords or Subject

related questions