ansaurus

Question

Python, Source-Code Encoding Problem

Answer 1

+2 A:

Perhaps you should be using unicode literals (e.g. u'€') instead.

Ignacio Vazquez-Abrams 2010-01-23 13:38:25

volting 2010-01-23 13:54:17

1) Your file isn't UTF-8. 2) They should *all* be unicode literals. http://farmdev.com/talks/unicode/

Ignacio Vazquez-Abrams 2010-01-23 14:00:55

...informative presentation thanks, although I'm not sure Im much wiser... To clarify what you said, 'They should all be unicode literals' when u say 'all' do you mean all characters not included in the ASCII set? Ive done this any it runs, but non ASCII characters are still printed in unicode hex eg. € = u'\u20ac'

volting 2010-01-23 14:57:31

Then you should consider showing the code that actually does the work.

Ignacio Vazquez-Abrams 2010-01-23 15:10:46

<code> print e_munge </code>This is the way Im doing it at the moment just for debugging purposes but eventually the characters will printed to a Tkinter GUI

volting 2010-01-23 19:41:59

That doesn't print the characters, that prints the `repr()` of the list. Things will not work as you like. Print the actual elements if you want it to work.

Ignacio Vazquez-Abrams 2010-01-24 00:28:20

Ok got it now thanks for all your help. Im new to python (not programming though) and I guess throwing character encoding in the mix doesn't help... Thanks again,

volting 2010-01-25 01:41:55

Answer 2

+2 A:

The line:

# -*- coding: UTF-8 -*-

declares that the source file is saved in UTF-8. Anything else is an error.

When you declare byte strings in your source code:

e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ]

then byte strings like "€" will actually contain the encoded bytes used to save the source file.

When you use Unicode strings instead:

    e_munge = [ u"E", u"3", u"&", u"€", u"£", u"[-", u"|=-", u"?" ]

then when u followed by the byte-string "€" is read by Python from a source file, it uses the declared encoding to decode that character into Unicode.

An illustration:

# coding: utf-8
bs = '€'
us = u'€'
print repr(bs)
print repr(us)

OUTPUT:

'\xe2\x82\xac'
u'\u20ac'

Mark Tolonen 2010-01-23 15:42:00

ok I already deduced that, but how do I get it to print out the character € and not the unicode code...

volting 2010-01-23 19:39:26

Answer 3

+1 A:

print some_list is in effect print repr(some_list) -- that's why you see \u20ac instead of a Euro character. For debugging purposes, the "unicode hex" is exactly what you need for unambiguous display of your data.

You appear to have perfectly OK unicode objects in your list; I suggest that you don't "print" the list to Tkinter.

John Machin 2010-01-24 00:40:53

Well I won't be printing all the lists to Tkinter(atleast not at one time). The program will be a simple password generator which will allow a user to input a word that they would like to use for a password, the program will then do a pseudo-random munge of the word and output the result to a tkinter text box so that the user can copy and past to wherever... .Why do suggest that I dont output to Tkinter?

volting 2010-01-24 01:13:53

You said that "the characters will printed to a Tkinter GUI". I'm merely suggesting that you don't use the Python `print` statement to send the data to Tkinter for display.

John Machin 2010-01-24 10:17:40

Ok fair enough, I guess my previous comment was a little ambiguous, thanks for your input.

volting 2010-01-24 11:35:49

ansaurus

tags:

views:

answers:

Python, Source-Code Encoding Problem

related questions