ansaurus

Question

Unicode in PDF

Answer 1

+1 A:

Look in the PDF specs. They're freely downloadable from Adobe.

Ferruccio 2008-09-24 16:39:44

I've read the spec you link to, and not found my answer. They refer to unicode in many places, but don't say how to encode it. Possibly that's because I'm not reading them right - can you point me to the section that answers my question?

Marcus Downing 2008-09-24 16:52:35

btw, I'm not denying that the specs are the best place to look. Just that I haven't yet found my answer in them.

Marcus Downing 2008-09-24 17:03:29

Answer 2

A:

I'm not a PDF expert, and (as Ferruccio said) the PDF specs at Adobe should tell you everything, but a thought popped up in my mind:

Are you sure you are using a font that supports all the characters you need?

In our application, we create PDF from HTML pages (with a third party library), and we had this problem with cyrillic characters...

Filini 2008-09-24 16:57:31

We're sticking to the basic fonts that are on every computer, and not embedding any fonts.

Marcus Downing 2008-09-24 17:04:28

Answer 3

+2 A:

The simple answer is that there's no simple answer. If you take a look at the PDF specification, you'll see an entire chapter — and a long one at that — devoted to the mechanisms of text display. I implemented all of the PDF support for my company, and handling text was by far the most complex part of exercise. The solution you discovered — use a 3rd party library to do the work for you — is really the best choice, unless you have very specific, special-purpose requirements for your PDF files.

Derek Clegg 2008-09-27 14:28:03

Answer 4

+1 A:

See Appendix D (page 995) of the PDF specification. There is a limited number of fonts and character sets pre-defined in a PDF consumer application. To display other characters you need to embed a font that contains them. It is also preferable to embed only a subset of the font, including only required characters, in order to reduce file size. I am also working on displaying Unicode characters in PDF and it is a major hassle.

Check out PDFBox or iText.

http://www.adobe.com/devnet/pdf/pdf_reference.html

jm4 2008-10-02 15:31:06

Answer 5

+3 A:

In the PDF reference in chapter 3, this is what they say about Unicode:

Text strings are encoded in either PDFDocEncoding or Unicode character encoding. PDFDocEncoding is a superset of the ISO Latin 1 encoding and is documented in Appendix D. Unicode is described in the Unicode Standard by the Unicode Consortium (see the Bibliography). For text strings encoded in Unicode, the first two bytes must be 254 followed by 255. These two bytes represent the Unicode byte order marker, U+FEFF, indicating that the string is encoded in the UTF-16BE (big-endian) encoding scheme specified in the Unicode standard. (This mechanism precludes beginning a string using PDFDocEncoding with the two characters thorn ydieresis, which is unlikely to be a meaningful beginning of a word or phrase).

plinth 2008-10-02 15:39:12

ansaurus

tags:

views:

answers:

Unicode in PDF

related questions