How can I replace HTML-entities in unicode-Strings with proper unicode?
u'"HAUS Kleider" - Über das Bekleiden und Entkleiden, das VerhŸllen und Veredeln'
to
u'"HAUS-Kleider" - Über das Bekleiden und Entkleiden, das Verhüllen und Veredeln'
edit
Actually the entities are wrong. At it seems like BeautifulSoup f...ed it up.
So the question is: How to deal with utf-8 encoded String and BeautifulSoup?
from BeautifulSoup import BeautifulSoup
f = open('path_to_file','r')
lines = [i for i in f.readlines()]
soup = BeautifulSoup(''.join(lines))
allArticles = []
for row in rows:
l =[]
for r in row.findAll('td'):
l += [r.string] # here things seem to go wrong
allArticles+=[l]
Ü -> Ÿ
instead of Ü
but actually I don't want the encoding to be changed anyway.
>>> soup.originalEncoding
'utf-8'
but I cant generate a proper unicode string of it