A href catching | ansaurus

tags:

views:

49

answers:

1

Q:

A href catching

Hello, I'm using BeautifulSoup for parsing some html. Here is the content:

<tr> 
<th>Your provider:</th> 
<td> 

<img src="/isp_logos/la-la-la.ico" alt=""/> 
 <a href="/isp/SomeProvider"> 
 Provider name </a> 
 &nbsp;
 <a href="http://*/isp-comparer/?isp=000000"&gt; 
 </a> 
</td> 
</tr>

I have to get SomeProvider text from the link . My code is:

contentSoup = BeautifulSoup(ThatHtml)
print contentSoup.findAll('a', href=re.compile('/isp/(.*)'))

The result is empty array, why? Maybe there are another ways?

A:

With your posted code and input, I'm getting:

[<a href="/isp/SomeProvider">   Provider name </a>]

As the return of the array. Are you using the newest 3.1.x version of BeautifulSoup? I actually had the same problem, but it turns out I downloaded the 2.x version of BeautifulSoup thinking that the 2.x meant it was compatible with python 2.x.

Assuming that the first contains the SomeProvider, you could just use:

contentSoup.a

to extract that tag.

Kevin 2010-07-17 09:53:45

related questions

Autosizing Textarea

Regular expression for parsing links from a webpage?

What are good tools for creating compiled HTML help files (.chm)?

Looking for WYSIWYG HTML editor

Any reason not to start using the HTML 5 doctype?

HTML comments break down

HTML Comments Markup

Setting a div's height in HTML with CSS

Wrapping lists into columns

Is a "Confirm Email" input good practice when user changes email address?

<XMP> Tag

HTML version choice

Options for HTML scraping?

How do you disable browser Autocomplete on web form field / input tag?

How do I make a checkbox toggle from clicking on the text label as well?

Html CSS Editor

Wordpress theme development offline tools

How do I give my web sites an icon for iPhone?

In HTML, how to word-break on a dash?

Detecting font in JavaScript

How do you test layout design across multiple browsers/OSs?

How do I print an HTML document from a web service?

Multiple submit buttons on a HTML form

How can I determine a web user's time zone?

Why doesn't the percentage width child in absolutely positioned parent work in IE7?