tags:

views:

59

answers:

2

Why does IDLE handle one symbol correctly but not another?

>>> e = '€'
>>> print unichr(ord(e))
€     # looks like a very thin rectangle on my system.
>>> p = '£'
>>> print unichr(ord(p))
£
>>> ord(e)
128
>>> ord(p)
163

I tried adding various # coding lines, but that didn't help.

EDIT: browser should be UTF-8, else this will look rather strange

EDIT 2: On my system, the euro char is displayed correctly on line 1, but not in the print line. The pound char is displayed correctly both places.

+2  A: 

The answer depends what encoding the IDLE REPL is using. You should be more explicit about what's actually unicode text, and what's a byte sequence. Meditate on this example:

# -*- coding: utf-8 -*-
c = u'€'
print type(c)
for b in c.encode('utf-8'):
    print ord(b)

c = '€'
print type(c)
for b in c:
    print ord(b)

EDIT:

As for IDLE, it's kind of borken, and needs to be patched to work correctly.

IDLE 1.2.2      
>>> c = u'€'
>>> ord(c)
128
>>> c.encode('utf-8')
'\xc2\x80'
>>> c
u'\x80'
>>> print c
€[the box thingy]


>>> c = u'\u20ac'
>>> ord(c)
8364
>>> c.encode('utf-8')
'\xe2\x82\xac'
>>> c
u'\u20ac'
>>> print c
€

In the first session, by the time the € is interpreted, it has already been mis-encoded, and is unrecoverable.

Jonathan Feinberg
+1, yep, proper explicit coding is the key.
Alex Martelli
Explicit utf-8 doesn't help (or isn't enough), but explicitly adding the u does. I wonder why the pound char didn't require the u?
foosion
print c (c = u'€') prints a euro char. print unichr(ord(c)) does not. Shouldn't it?
foosion
It depends on the encodings (in and out) of your console.
Jonathan Feinberg
I can understand why console differences would explain why pound and euro get different results, but why would print c and print unichr(ord(c)) get different results, especially when c == unichr(ord(c))?
foosion
See my edit. 15 chars. 15 chars.
Jonathan Feinberg
Broken would do it
foosion
A: 

The problem is probably that your font doesn't have the proper glyphs. In addition to getting the encoding right, you have to have the proper font when presenting the text in the IDLE ui. Try using a different font to see if it helps (Arial Unicode has a very large glyph complement, for example).

The euro symbol is much newer than the pounds sterling symbol, so your font may not have a euro glyph.

Ned Batchelder
See my edit to the question. My system is displaying the euro char correctly in the first line, which means I have the proper glyph. The problem is that unichr(ord(e)) does not display the euro, perhaps having something to do with ord(e) being 128?
foosion