views:

95

answers:

3

I'm using Notepad++ editor on windows with format set to ASCII, I've read "PEP 263: Source Code Encodings" and amended my code accordingly (I think), but there are characters still printing in hex...

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import os, sys

a_munge = [ "A", "4", "/\\", "\@", "/-\\", "^", "aye", "?" ]
b_munge = [ "B", "8", "13", "I3", "|3" , "P>", "|:", "!3", "(3", "/3", "3","]3" ]
c_munge = [ "C", "<", "(", "{", "(c)" ]
d_munge = [ "D", "|)", "|o", "?", "])", "[)", "I>", "|>", " ?", "T)", "0", "cl" ]
e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ]
         .
         .
         .
+2  A: 

Perhaps you should be using unicode literals (e.g. u'€') instead.

Ignacio Vazquez-Abrams
volting
1) Your file isn't UTF-8. 2) They should *all* be unicode literals. http://farmdev.com/talks/unicode/
Ignacio Vazquez-Abrams
...informative presentation thanks, although I'm not sure Im much wiser... To clarify what you said, 'They should all be unicode literals' when u say 'all' do you mean all characters not included in the ASCII set? Ive done this any it runs, but non ASCII characters are still printed in unicode hex eg. € = u'\u20ac'
volting
Then you should consider showing the code that actually does the work.
Ignacio Vazquez-Abrams
<code> print e_munge </code>This is the way Im doing it at the moment just for debugging purposes but eventually the characters will printed to a Tkinter GUI
volting
That doesn't print the characters, that prints the `repr()` of the list. Things will not work as you like. Print the actual elements if you want it to work.
Ignacio Vazquez-Abrams
Ok got it now thanks for all your help. Im new to python (not programming though) and I guess throwing character encoding in the mix doesn't help... Thanks again,
volting
+2  A: 

The line:

# -*- coding: UTF-8 -*-

declares that the source file is saved in UTF-8. Anything else is an error.

When you declare byte strings in your source code:

e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ]

then byte strings like "€" will actually contain the encoded bytes used to save the source file.

When you use Unicode strings instead:

    e_munge = [ u"E", u"3", u"&", u"€", u"£", u"[-", u"|=-", u"?" ]

then when u followed by the byte-string "€" is read by Python from a source file, it uses the declared encoding to decode that character into Unicode.

An illustration:

# coding: utf-8
bs = '€'
us = u'€'
print repr(bs)
print repr(us)

OUTPUT:

'\xe2\x82\xac'
u'\u20ac'
Mark Tolonen
ok I already deduced that, but how do I get it to print out the character € and not the unicode code...
volting
+1  A: 

print some_list is in effect print repr(some_list) -- that's why you see \u20ac instead of a Euro character. For debugging purposes, the "unicode hex" is exactly what you need for unambiguous display of your data.

You appear to have perfectly OK unicode objects in your list; I suggest that you don't "print" the list to Tkinter.

John Machin
Well I won't be printing all the lists to Tkinter(atleast not at one time). The program will be a simple password generator which will allow a user to input a word that they would like to use for a password, the program will then do a pseudo-random munge of the word and output the result to a tkinter text box so that the user can copy and past to wherever... .Why do suggest that I dont output to Tkinter?
volting
You said that "the characters will printed to a Tkinter GUI". I'm merely suggesting that you don't use the Python `print` statement to send the data to Tkinter for display.
John Machin
Ok fair enough, I guess my previous comment was a little ambiguous, thanks for your input.
volting