ansaurus

Question

Finding the Unicode codepoint of a character in GNU Emacs

Answer 1

+4 A:

In a modern Emacs, M-x describe-char will tell you about the character at point.
An example:

  character: ¢ (2210, #o4242, #x8a2, U+00A2)
    charset: latin-iso8859-1
         (Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100.)
 code point: #x22
     syntax: w  which means: word
   category: l:Latin
buffer code: #x81 #xA2
  file code: #xC2 #xA2 (encoded by coding system utf-8)
    display: by this font (glyph code)
     -apple-monaco-medium-r-normal--12-120-72-72-m-120-mac-roman (#xA2)

Note the U+00A2 in the first part, which gives the Unicode codepoint of the character.

anonfunc 2008-10-25 09:35:54

describe-char is bound to C-x = in Emacs 23.Place your cursor (also called "point") over a char and go C-u C-x =

Leonel 2010-07-23 12:20:26

Answer 2

+1 A:

Thanks for the quick answers. I looked at the source code for describe-char, and found the following snippet which solves my problem. I tested it in both XEmacs 21.4.13 Mule and GNU Emacs 22.1.1 and it seems to work.

(or (get-char-property (point) 'untranslated-utf-8)
    (encode-char (char-after) 'ucs))

2008-10-25 09:55:22

ansaurus

tags:

views:

answers:

Finding the Unicode codepoint of a character in GNU Emacs

related questions