I'm new to python and I'm using BeautifulSoup to parse a website and then extract data. I have the following code:
for line in raw_data: #raw_data is the parsed html separated into smaller blocks
d = {}
d['name'] = line.find('div', {'class':'torrentname'}).find('a')
print d['name']
<a href="/ubuntu-9-10-desktop-i386-t3144211.html">
<strong class="red">Ubuntu</strong> 9.10 desktop (i386)</a>
Normally I would be able extract 'Ubuntu 9.10 desktop (i386)' by writing:
d['name'] = line.find('div', {'class':'torrentname'}).find('a').string
but due to the strong html tags it returns None. Is there a way to extract the strong tags and then use .string or is there a better way? I have tried using BeautifulSoup's extract() function but I couldn't get it to work.
Edit: I just realized that my solution does not work if there are two sets of strong tags as the space between the words are left out. What would be a way to fix this problem?