views:

118

answers:

1

I run to get some value as score.

score = soup.find('div', attrs={'class' : 'summarycount'})

I run 'print score' to get as follows.

<div class=\"summarycount\">524</div>

I need to extract the number part. I used re module but failed.

m = re.search("[^\d]+(\d+)", score)
TypeError: expected string or buffer

function search in re.py at line 142
return _compile(pattern, flags).search(string)
  • What's the return type of the find function?
  • How to get the number from the score variable?
  • Is there any easy way to let BeautifulSoup to return the value(in this case 524) itself?
+2  A: 

It returns an object, which you can use for further searches or to extract its contents with score.contents:

from BeautifulSoup import BeautifulSoup

str = r'''
    <body>
    <div class="summarycount">524</div>
    <div class="foo">111</div>
    </body>
'''

soup = BeautifulSoup(str)
score = soup.find('div', attrs={'class' : 'summarycount'})

print type(score)
print score.contents

Prints:

<class 'BeautifulSoup.Tag'>
[u'524']

The full documentation with multiple examples is available here.

Eli Bendersky