ansaurus

Question

How do I print a line following a line containing certain text in a saved file in Python?

Answer 1

+2 A:

You should be using a HTML parser such as BeautifulSoup or lxml instead.

Ignacio Vazquez-Abrams 2010-02-28 05:31:52

Could you explain to me how to do this with either of those?

ErikT 2010-02-28 05:40:22

`soup.find('div', {'class': 'carrier_result'}).text`

Ignacio Vazquez-Abrams 2010-02-28 05:48:33

Thank you for the example

ErikT 2010-02-28 05:52:13

Answer 2

+2 A:

What you really want to be doing is parsing the HTML properly. Use the BeautifulSoup library - it's wonderful at doing so.

Sample code:

import urllib2, BeautifulSoup

opener = urllib2.build_opener()
opener.addheaders[0] = ('User-agent', 'Mozilla/5.1')

response = opener.open('http://www.whitepages.com/carrier_lookup?carrier=other&amp;number_0=1112223333&amp;response=1').read()

bs = BeautifulSoup.BeautifulSoup(response)
print bs.findAll('div', attrs={'class': 'carrier_result'})[0].next.strip()

MikeyB 2010-02-28 05:32:42

Could you please explain how to do this with beautifulsoup? I looked at their website and was confused.

ErikT 2010-02-28 05:39:54

Be wary of 'working around' a website's controls - this may draw their ire.

MikeyB 2010-02-28 05:43:30

Thank you, and thanks for the advice too. I'll keep that in mind, though I'll probably have to stick with it this way as I have yet to find another way to find a carrier given the cellphone number.

ErikT 2010-02-28 05:45:06

Answer 3

+2 A:

to get the next line, you can use

htmlsource = open('carrier.html', 'r')
for line in htmlsource:
    if '<div class="carrier_result">' in line:
         nextline = htmlsource.next()
         print nextline

A "better" way is to split on </div>, then get the things you want, as sometimes the stuff you want can occur all in one line. So using next() if give wrong result.eg

data=open("carrier.html").read().split("</div>")
for item in data:
    if '<div class="carrier_result">' in item:
       print item.split('<div class="carrier_result">')[-1].strip()

by the way, if its possible, try to use Python's own web module, like urllib, urllib2 instead of calling external wget.

ghostdog74 2010-02-28 05:32:51

Thank you. Your answer was the only one that did not use beautifulsoup, but since so many other answers did I might try it both ways. I tried using urllib, but it did not work. This is because the website only allows views from certain browsers (and this is the reason why I had to call wget with a certain browser agent). If there is a way to use urllib and faking a browser agent, please tell me as I would much rather not have to call wget.

ErikT 2010-02-28 05:39:11

Heh, I noticed that too and am about to post a workaround... be careful as this may piss them off.

MikeyB 2010-02-28 05:39:56

if you look at the urllib2 documentation http://docs.python.org/library/urllib2.html, way below the page, there are examples of adding http headers to your requests. Not sure if it will work for you, but you can give it a try. As for using BeautifulSoup and stuff, i believe ideally you should use it, but i also believe that if the problem you are trying to solve is simple enough, there is no need to use them. Just using Python in builts will do

ghostdog74 2010-02-28 05:42:28

Your answer was good, but MikeyB's is more efficient and makes good use of BeautifulSoup.

ErikT 2010-02-28 05:51:27

ansaurus

tags:

views:

answers:

How do I print a line following a line containing certain text in a saved file in Python?

related questions