views:

56

answers:

2

Hi, I have written some code which sends queries to google and returns the query results. Apparently the contents which are retrieved are in unicode format, so when I put them in a list for example and print this list (the whole list together and not member by member) an annoying extra 'u' is always behind all of the members in this list..How can I get rid of them? I tried to convert the whole text to ascii but because there are some non-ascii characters(different languages) is in the text it fails, now do u know what I should do to have a better output? and I hope this extra 'u' doesn't make any troubles. thanks

+4  A: 

Instead of:

>>> print your_list
[u'foo', u'bar']

Use:

>>> print '\n'.join(your_list)
foo
bar

You can use ', ' instead of '\n' as the separator if you prefer to keep it all on one line.

You may also have problems if you are trying to display Unicode characters in the Windows console. If so, you could use for example IDLE which can display Unicode characters. Alternatively you can convert to ASCII and ignore the characters that don't exist in ASCII:

print '\n'.join(x.encode('ascii', 'ignore') for x in your_list)
Mark Byers
+1  A: 

If your going to do anything meaningful with your output, you have to decide which output encoding you want. Throwing all those non-ascii characters away is not even the second best solution. Decide for an appropiate output encoding (e.g. for shell output your shell encoding, for web output your web encoding, best all-rounder is UTF-8) and encode appropiately: ', '.join(x.encode('utf-a') for x in your_list) (En-/Decoding )

knitti