I'm using Beautiful soup to scrape data. The BS documentation states that BS should always return Unicode but I can't seem to get Unicode. Here's a code snippet
import urllib2
from libs.BeautifulSoup import BeautifulSoup
# Fetch and parse the data
url = 'http://wiki.gnhlug.org/twiki2/bin/view/Www/PastEvents2007?skin=print.pattern'
data = urllib2.urlopen(url).read()
print 'Encoding of fetched HTML : %s', type(data)
soup = BeautifulSoup(data)
print 'Encoding of souped up HTML : %s', soup.originalEncoding
table = soup.table
print type(table.renderContents())
The original data returned from the page is a string. BS shows the original encoding as ISO-8859-1. I thought that BS automatically converted everything to Unicode so why is it that when I do this:
table = soup.table
print type(table.renderContents())
..it gives me a string object and not Unicode?
How can i get a Unicode objects from BS?
I'm really, really lost with this. Any help? Thanks in advance.