I am working through the Django RSS reader project here.
The RSS feed will read something like "OKLAHOMA CITY (AP) — James Harden let". The RSS feed's encoding reads encoding="UTF-8" so I believe I am passing utf-8 to markdown in the code snippet below. The em dash is where it chokes.
I get the Django error of "'ascii' codec can't encode character u'\u2014' in position 109: ordinal not in range(128)" which is an UnicodeEncodeError. In the variables being passed I see "OKLAHOMA CITY (AP) \u2014 James Harden". The code line that is not working is:
content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")
I am using markdown 2.0, django 1.1, and python 2.4.
What is the magic sequence of encoding and decoding that I need to do to make this work?
(In response to Prometheus' request. I agree the formatting helps)
So in views I add a smart_unicode line above the parsed_feed encoding line...
content = smart_unicode(content, encoding='utf-8', strings_only=False, errors='strict')
content = content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")
This pushes the problem to my models.py for me where I have
def save(self, force_insert=False, force_update=False):
if self.excerpt:
self.excerpt_html = markdown(self.excerpt)
# super save after this
If I change the save method to have...
def save(self, force_insert=False, force_update=False):
if self.excerpt:
encoded_excerpt_html = (self.excerpt).encode('utf-8')
self.excerpt_html = markdown(encoded_excerpt_html)
I get the error "'ascii' codec can't decode byte 0xe2 in position 141: ordinal not in range(128)" because now it reads "\xe2\x80\x94" where the em dash was