In python, I need to store one element of the source of an html page as a string. How can I do this?

views:

answers:

+1 Q:

In python, I need to store one element of the source of an html page as a string. How can I do this?

So far I have managed to write some code that should print the source of the page. The problem is, it doesn't. I tried it with another web site, and it printed it out fine, so I used wget on the page "http://www.whitepages.com/carrier_lookup?carrier=other&number_0=2165138899&response=1" which should download the page for me. It gave " ERROR 403: Forbidden. ", so I'm not really sure how to access the html now.

The second part of the problem is that when I manage to download the html and save it as a string, I need to save as a different string the carrier that the search found. This is accessible as the line under the [div class="carrier_result"] line in the source code. In the previous sentence I replaced the < and > with brackets because sourceforge would not let me post the html.

So far the code I have is: http://pastebin.com/u4HUv3Rj

Thanks to anyone who helps me with this.

+2 A:

For an explanation of what a 403 result from HTTP means, and how to deal with it, see here.

I have no idea what "I need to save as a different string the carrier that the search found" can possibly mean -- I can't even parse it as an English sentence, nor do I know what "the line under the line" means either. Please rephrase (if English isn't your native language, I can try grokking Italian, French, Spanish, German, or Latin -- in decreasing probability and with no guarantee of success, but it can't be worse than w/your current phrasing;-).

Alex Martelli 2010-02-27 03:55:43

Sorry about my ambiguous word choice. I will try to describe what I need to do more clearly.I have a program that needs to find which carrier the entered cellphone number is on.Since I can directly manipulate in python the url of the http://www.whitepages.com/carrier_lookup website to lookup a specific phone number, I figured that there has to be some way to read a certain line in the source of the page.Looking through the source of the outputted page, I discovered that the name of the carrier is on the line following the <div class="carrier_result"> tag.

ErikT 2010-02-27 04:09:17

Use BeautifulSoup to locate that specific `div` tag and get its contents, see http://www.crummy.com/software/BeautifulSoup/ .

Alex Martelli 2010-02-27 04:14:41

Thank you. That might be exactly what I was looking for. I'll check it out now.

ErikT 2010-02-27 04:27:19

ansaurus

tags:

views:

answers:

In python, I need to store one element of the source of an html page as a string. How can I do this?

related questions