ansaurus

Question

short Unicode \N{} names for Latin-1 characters in Python ?

Answer 1

+2 A:

Sorry, no, there's no such thing. In string literals, anyway... you could perhaps piggyback on another encoding scheme, such as HTML:

>>> import HTMLParser
>>> HTMLParser.HTMLParser().unescape(u'a &auml; b c')
u'a \xe4 b'

But I don't think this'd be worth it.

Hardly anyone even uses the \N notation in any case... for the occasional character the \xnn notation is acceptable; for more involved usage you're better off just typing ä directly and making sure a # coding= is defined in the script as per PEP263. (If you don't have a keyboard layout that can type those diacriticals directly, get one. eg. eurokb on Windows, or using the Compose key on Linux.)

bobince 2009-12-26 13:34:05

agree, wouldn't be worth it -- was hoping for \N{auml}.Is there a "compose" on Macs ?

Denis 2009-12-26 16:50:58

Not exactly like Compose but apparently you should be able to type `ä` with `Option`+`U` then `A`. http://tlt.its.psu.edu/suggestions/international/accents/codemac.html

bobince 2009-12-26 18:34:17

Answer 2

A:

Have you thought about writing your own converter? It wouldn't be hard to write something that would go through a file and replace \N{A umlaut} with \N{LATIN SMALL LETTER A WITH DIAERESIS} and all the rest.

Mark Ransom 2009-12-26 14:19:32

Answer 3

+4 A:

If you want to do the right thing please use UTF-8 in your python source code. This will keep the code much more readable.

Python is able to real UTF-8 source files, all you have to do is to add an additional line after the first one:

#!/usr/bin/python
# -*- coding: UTF-8 -*-

By the way, starting with Python 3.0 UTF-8 is the default encoding so you will not need this line anymore. See PEP3120

Sorin Sbarnea 2009-12-26 15:11:25

ok, but (clarification added) I was hoping for \N{aumlaut} or the like, fast to type and clear too

Denis 2009-12-26 16:44:19

Answer 4

+1 A:

You can put an actual "ä" character in your string. For this you have to declare the encoding of the source code at the top

#!/usr/bin/env python
# encoding: utf-8

x = u"ä"

Roberto Bonvallet 2009-12-26 15:14:33

Answer 5

A:

You can use the Unicode notation \uXXXX do describe that character:

u"\u00E4"

Gumbo 2009-12-26 15:21:18

Answer 6

A:

On Windows, you can use the charmap.exe utility to look up the keyboard shortcut for common letters you're using such as:

ALT-0223 = ß
ALT-0228 = ä
ALT-0246 = ö

Then use Unicode and save in UTF-8:

# -*- coding: UTF-8 -*-
phrase = u'Löwenbräu Weißbier'

or use a converter as someone else mentioned and make up your own shortcuts:

# -*- coding: UTF-8 -*-

def german(s):
    s = s.replace(u'SS',u'ß')
    s = s.replace(u'a:',u'ä')
    s = s.replace(u'o:',u'ö')
    return s

phrase = german(u'Lo:wenbra:u WeiSSbier')
print phrase

Mark Tolonen 2009-12-26 21:25:06

ansaurus

tags:

views:

answers:

short Unicode \N{} names for Latin-1 characters in Python ?

related questions