ansaurus

Question

How to show characters non ascii in python?

Answer 1

+2 A:

How can I print s variable to show the character Ã???
use print:

>>> s = 'Ã'
>>> s
'\xc3'
>>> print s
Ã

jcoon 2009-05-26 14:06:47

Apparently he or she can't know in advance the encoding, so I think it should be converted to Unicode first (see my answer).

Bastien Léonard 2009-05-26 14:15:05

It works, but how can I do it if I get the content of a web page in this way?:def getUrlContent(url): """ Gets the html content of an url """ socket = urllib2.urlopen(url).fp html = urllib.unquote(socket.read()) socket.close() return html

jaloplo 2009-05-26 14:15:57

Answer 2

+1 A:

I would use ord() to find out if a character is ASCII/special:

if ord(c) > 127:
    # special character

This probably won't work with multibyte encodings such as UTF-8. In this case, I would convert to Unicode before testing.

If you get special characters from a web page, you should know the encoding. Then decode it, see Unicode HOWTO.

Edit: I'm definitely not sure what this question is about... It may be a good idea to clarify it.

Bastien Léonard 2009-05-26 14:07:17

How can I know the encoding of a web page?

jaloplo 2009-05-26 14:58:49

that's not so trivial, when the html does not explicitly states it's encoding. however there are tools to guess the encoding, e.g. jchardet: http://jchardet.sourceforge.net/; another bruteforce method is to iterate over all encodings provided by the ``iconv`` utility.

The MYYN 2009-05-26 15:23:58

Answer 3

+2 A:

Suppose you want to print it as utf-8. Before python 3, the best is to specifically encode it

print u'Ã'.encode('utf-8')

if you get the text externally then you have to specifically decode('utf-8) such as

f = open(my_file)
a = f.next().decode('utf-8') # you have a unicode line in a
print a.encode('utf-8')

odwl 2009-05-26 15:41:53

ansaurus

tags:

views:

answers:

How to show characters non ascii in python?

related questions