views:

516

answers:

1

I am working through the Django RSS reader project here.

The RSS feed will read something like "OKLAHOMA CITY (AP) — James Harden let". The RSS feed's encoding reads encoding="UTF-8" so I believe I am passing utf-8 to markdown in the code snippet below. The em dash is where it chokes.

I get the Django error of "'ascii' codec can't encode character u'\u2014' in position 109: ordinal not in range(128)" which is an UnicodeEncodeError. In the variables being passed I see "OKLAHOMA CITY (AP) \u2014 James Harden". The code line that is not working is:

content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")

I am using markdown 2.0, django 1.1, and python 2.4.

What is the magic sequence of encoding and decoding that I need to do to make this work?


(In response to Prometheus' request. I agree the formatting helps)

So in views I add a smart_unicode line above the parsed_feed encoding line...

content = smart_unicode(content, encoding='utf-8', strings_only=False, errors='strict')
content = content = content.encode(parsed_feed.encoding, "xmlcharrefreplace") 

This pushes the problem to my models.py for me where I have

def save(self, force_insert=False, force_update=False): 
     if self.excerpt: 
         self.excerpt_html = markdown(self.excerpt) 
         # super save after this 

If I change the save method to have...

def save(self, force_insert=False, force_update=False): 
     if self.excerpt: 
         encoded_excerpt_html = (self.excerpt).encode('utf-8') 
         self.excerpt_html = markdown(encoded_excerpt_html)

I get the error "'ascii' codec can't decode byte 0xe2 in position 141: ordinal not in range(128)" because now it reads "\xe2\x80\x94" where the em dash was

+1  A: 

Read Django's unicode documentation.

from django.utils.encoding import smart_unicode, smart_str

prometheus
Using...content = smart_unicode(content, encoding='utf-8', strings_only=False, errors='strict') content = content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")pushes the problem to my models.py for me where I have def save(self, force_insert=False, force_update=False): if self.excerpt: self.excerpt_html = markdown(self.excerpt) # super save after thisIf I change the save method to have encoded_excerpt_html = (self.excerpt).encode('utf-8') self.excerpt_html = markdown(encoded_excerpt_html)
Part 2: I get the error "'ascii' codec can't decode byte 0xe2 in position 141: ordinal not in range(128)" because now it reads "\xe2\x80\x94" where the em dash was.
Could you please amend your original post with the above? It's very difficult to read without proper formatting.
prometheus