ansaurus

Question

Answer 1

+5 A:

soup2 = BeautifulSoup(str(arr[i]))
arr2 = soup2.findAll('td')

Don't do this: Just call arr2 = arr[i].findAll('td') instead.

This will also be slow:

if str(j).find("<a href=") > 0:
    data.sourceURL = self.getAttributeValue(str(j),'<a href="')

Assuming that getAttributeValue gives you the href attribute, use this instead:

a = j.find('a', href=True)       #find first <a> with href attribute
if a:
    data.sourceURL = a['href']
else:
    #....

In general, you shouldn't need to convert the BeautifulSoup object back into a string if all you want to do is parse it and extract values. Since the find and findAll methods give you back searchable objects, you can keep searching by invoking the find/findAll/etc. methods on the results.

interjay 2010-04-26 10:06:03

ansaurus

tags:

views:

answers:

Optimizing BeautifulSoup (Python) code

related questions