views:

89

answers:

4

In this code:

   soup=BeautifulSoup(program.Description.encode('utf-8'))
   name=soup.find('div',{'class':'head'})
   print name.string.decode('utf-8')

error happening when i'm trying to print or save to database.

dosnt metter what i'm doing:

print name.string.encode('utf-8')

or just

 print name.string


Traceback (most recent call last):
  File "./manage.py", line 16, in <module>
    execute_manager(settings)
  File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/core/management/__init__.py", line 362, in execute_manager
    utility.execute()
  File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/core/management/__init__.py", line 303, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/core/management/base.py", line 195, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/core/management/base.py", line 222, in execute
    output = self.handle(*args, **options)
  File "/usr/local/cluster/dynamic/website/video/remmedia/management/commands/remmedia.py", line 50, in handle
    self.FirstTimeLoad()
  File "/usr/local/cluster/dynamic/website/video/remmedia/management/commands/remmedia.py", line 115, in FirstTimeLoad
    print name.string.decode('utf-8')
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-5: ordinal not in range(128)

This is repr(name.string)

u'\u0412\u044b\u043f\u0443\u0441\u043a \u043e\u0442 27 \u0434\u0435\u043a\u0430\u0431\u0440\u044f'

A: 

Edit: name.string comes from BeautifulSoup, so it is presumably already a unicode string.

However, your error message mentions 'ascii':

UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-5:
ordinal not in range(128)

According to the PrintFails Python wiki page, if Python does not know or can not determine what kind of encoding your output device is expecting, it sets sys.stdout.encoding to None and print attempts to encode its arguments with the 'ascii' codec.

I believe this is the cause of your problem. You can can confirm this by seeing if print sys.stdout.encoding prints None.

According to the same page, linked above, you can circumvent the problem by explicitly telling Python what encoding to use. You do that be wrapping sys.stdout in an instance of StreamWriter:

For example, you could try adding

import sys
import locale
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)

to your script before the print statement. You may have to change locale.getpreferredencoding() to and explicit encoding (e.g. 'utf-8', 'cp1252', etc.). The right encoding to use depends on your output device. It should be set to whatever encoding your output device is expecting. If you are outputing to a terminal, the terminal may have a menu setting to allow the user to set what type of encoding the terminal should expect.

Original answer: Try:

 print name.string

or

 print name.string.encode('utf-8')
unutbu
Already tried not help. The interesting thing is that in python 2.6.5 it works. And 2.5.2 not.....
Pol
A: 

try

text = text.decode("utf-8", "replace")
JiminyCricket
The string is already decoded.
Aaron Gallagher
+4  A: 

I don't know what you are trying to do with name.string.decode('utf-8'). As the BeautifulSoup documentation eloquently points out, "BeautifulSoup gives you Unicode, dammit". So name.string is already decoded - it is in unicode. You can encode it back to utf-8 if you want to, but you can't decode it any further.

Daniel Roseman
A: 

You can try:

print name.string.encode('ascii', 'replace')

The output should be accepted whatever the encoding of sys.stdout is (including None).

In fact, the file-like object that you are printing to might not accept UTF-8. Here is an example: if you have the apparently benign program

# -*- coding: utf-8 -*-
print u"hérisson"

then running it in a UTF-8 capable terminal works fine:

lebigot@weinberg /tmp % python2.5 test.py 
hérisson

but printing to a standard output connected to a Unix pipe does not:

lebigot@weinberg /tmp % python2.5 test.py | cat
  Traceback (most recent call last):
  File "test.py", line 3, in <module>
print u"hérisson"
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

because sys.stdout has encoding None, in this case: Python considers that the program that reads through the pipe should receive ASCII, and the printing fails because no encoding is specified. A solution like the one above solves the problem.

Note: You can check the encoding of your standard output with:

print sys.stdout.encoding

This can help you debug encoding problems.

EOL