views:

117

answers:

2

Hi

I have this code that fetches some text from a page using BeautifulSoup

soup= BeautifulSoup(html)
body = soup.find('div' , {'id':'body'})
print body

I would like to make this as a reusable function that takes in some htmltext and the tags to match it like the following

def parse(html, atrs):
 soup= BeautifulSoup(html)
 body = soup.find(atrs)
 return body

But if i make a call like this

    parse(htmlpage, ('div' , {'id':'body'}"))  or like

parse(htmlpage, ['div' , {'id':'body'}"])

I get only the div element, the body attribute seems to get ignored.

Is there a way to fix this?

+1  A: 

I think you just need to add an asterisk here:

body = soup.find(*atrs)

Without the asterisk you are passing a single parameter which is a tuple:

body = soup.find(('div' , {'id':'body'}))

With the asterisk the tuple is expanded out and the statement becomes equivalent to what you want:

body = soup.find('div' , {'id':'body'})

See this article for more information on using the *args notation, and the related **kwargs.

Mark Byers
++, this is a nice alternative.
Eli Bendersky
Thanks for the link, i'm reading it right now.btw I had to add two asterisks on both the parameter list and in the soup.find place.
scott
+3  A: 
def parse(html, *atrs):
 soup= BeautifulSoup(html)
 body = soup.find(*atrs)
 return body

And then:

parse(htmlpage, 'div', {'id':'body'})
Eli Bendersky
Thanks for your answer, it worked. I didn't know that one could unpack lists using *, thought only dicts worked like that using *\*.
scott
@scott: read the article Mark linked to in his answer
Eli Bendersky