views:

49

answers:

1

Hello, I'm using BeautifulSoup for parsing some html. Here is the content:

<tr> 
<th>Your provider:</th> 
<td> 

<img src="/isp_logos/la-la-la.ico" alt=""/> 
 <a href="/isp/SomeProvider"> 
 Provider name </a> 
 &nbsp;
 <a href="http://*/isp-comparer/?isp=000000"&gt; 
 </a> 
</td> 
</tr>

I have to get SomeProvider text from the link . My code is:

contentSoup = BeautifulSoup(ThatHtml)
print contentSoup.findAll('a', href=re.compile('/isp/(.*)'))

The result is empty array, why? Maybe there are another ways?

A: 

With your posted code and input, I'm getting:

[<a href="/isp/SomeProvider">   Provider name </a>]

As the return of the array. Are you using the newest 3.1.x version of BeautifulSoup? I actually had the same problem, but it turns out I downloaded the 2.x version of BeautifulSoup thinking that the 2.x meant it was compatible with python 2.x.

Assuming that the first contains the SomeProvider, you could just use:

contentSoup.a

to extract that tag.

Kevin