views:

55

answers:

3

I am inserting metadata into postscript files with a program, to be distilled to pdf with Adobe Distiller. I am using this code that I grabbed from an online chapter of Thomas Merz's "Web Publishing with Acrobat-PDF":

/pdfmark where {pop} {userdict /pdfmark /cleartomark load put} ifelse

[ /Title (mot accenté)

  /Author (mot accenté)

  /Subject (mot accenté)

  /Keywords (mot accenté) 

/DOCINFO pdfmark

When you look at the metadata in the resulting pdf, the accented characters turn into "?" in the Subject and Keyword fields, but not the Title and Author fields. The characters are the same ascii 233

I tried replacing them with octal encoding (\351), which came out the same (Title and Author okay, Subject and Keywords messed up).

file encoding is latin-1,unix eol

I found a mention on adobe forums, but the answer didn't make sense to me.

http://forums.adobe.com/message/1165593 and http://forums.adobe.com/thread/307687

I changed the encoding to utf-8, inserted the characters binarily (in VIM : <Ctrl-v>u00e9), no change. I tried inserting the BOM in a few places, it didn't work.

This is with the Distiller from Acrobat Pro 9 (9.3.3177)

I didn't notice this problem with Acrobat Pro 7.

Does anybody know of a workaround to get the accented characters into ALL the metadata fields when modifying a postscript file, or tell me if I'm doing it wrong?

It seems weird that different fields would not accept the same bytes.

Possibly related SO question: http://stackoverflow.com/questions/128162

I am embedding all fonts.

+1  A: 

So, you're supposed to be able to use an ANSI encoded file and any characters which are in the PDFDocEncoding set (which the French accented characters are), but that doesn't work.

Another method is to still use a latin-1 encoded file, but put Unicode characters in octal form (2 bytes: \xxx\xxx). And start the string with the BOM : \377\366

So the above subject string "mot accenté" has to be translated to:

/Subject (\377\376\155\000\157\000\164\000\040\000\141\000\143\000\143\000\145\000\156\000\164\000\351\000)

This works, but it sucks. Anyone have anything better?

rpilkey
see my comment in the accepted answer for non-octal version..
rpilkey
@rpilkey: IIRC, it should be enough to just encode the accented character alone, keeping the rest in clear text ASCII. Like this: `/Subject (mot accent\351)`.
pipitas
Your solution works for the Title and Author fields, but not for the Subject and Keyword fields. This is with Adobe's Distiller 9.3.3177.
rpilkey
+2  A: 

Can you try using UTF16-BE for the encoding and starting the strings with 254 and 255 (thorn and y-dieresis)?

plinth
I tried opening the .ps file in Notepad++, go to Encoding , Convert to UCS-2 Big Endian, then save. It added the BOM at the beginning of the file and doubled its size, so I think it worked. Distiller errors out with: %%[ Error: undefined; OffendingCommand: þÿ ]%% %%[ Flushing: rest of job (to end-of-file) will be ignored ]%% %%[ Warning: PostScript error. No PDF file produced. ] %%So Distiller won't even look at a UCS-2 file here.This is on Windows XP by the way, if that makes a difference.
rpilkey
You don't want to convert the whole file to utf16-be, only the strings, so your strings should be /Subject (þÿ...) etc.
plinth
Thanks. That works. The string that works for my example is: /Subject (þÿ^@m^@o^@t^@ ^@a^@c^@c^@e^@n^@t^@é)where "^@" is the nul byte. (that's how it's displayed in Vim)Putting this into ascii files will be a chore, but it's doable. I don't know why those two fields require this but "Title" and "Author" don't.
rpilkey
To type in the nul byte in Vim, Ctrl-V u 0000
rpilkey
A: 

You do not need to escape/encode ALL the accented characters!

It is enough to keep the standard ASCII characters and just mix in the \NNN notation where a special character should appear.

The following Ghostscript command creates a two page PDF. It will have nearly empty pages, with 2 bookmarks/outlines included, plus the metadata with accents. Example is for Windows, on Unix/Linux just use gs and change the line end escapes to \:

gswin32c.exe ^
 -sDEVICE=pdfwrite ^
 -o 2-empty-pages-with-bookmarks-and-accents-in-metadata.pdf ^
 -c "[/Creator(brains&smarts)/Author(pipitas)/Subject(m\350t accent\351)/Title(mot accent\352)/Keywords(ganz sch\353\353 bl\353\353\d!)/DOCINFO pdfmark" ^
 -c "[/Page 1 /View [/XYZ null null null] /Title (Page One) /OUT pdfmark" ^
 -c "[/Page 2 /View [/XYZ null null null] /Title (Page Two) /OUT pdfmark" ^
 -c "200 500 moveto /Helvetica findfont 100 scalefont setfont (One) show showpage 200 500 moveto (Two) show showpage quit"
  .

I hope this finally settles your question "Does anybody know of a workaround to get the accented characters into ALL the metadata fields when modifying a postscript file?".

pipitas
OMG! You're doing all this in a single commandline! Wow...
Your solution works for the Title and Author fields, but not for the Subject and Keyword fields. This is with Adobe's Distiller 9.3.3177.
rpilkey
@rpilkey: it works for me for Subject and Keyword fields as without an obvious problem. Adobe Reader 9.3.3.
pipitas
Ah, but which distiller? You seem to be using Ghostscript, so it might be a bug in Adobe's distiller.
rpilkey
@rpilkey: Yes, my given commandline uses Ghostscript, and in the paragraph above I said: *"The following Ghostscript command creates a two page PDF."*
pipitas