tags:

views:

89

answers:

1

At exam.com is not about the weather:

Tokyo: 25°C

I want to use Django 1.1 and lxml to get information at the website. I want to get information that is of "25" only.

HTML exam.com structure as follows:

<p id="resultWeather">
    <b>Weather</b>
    Tokyo:
    <b>25</b>°C
</p>

I'm a student. I'm doing a small project with my friends. Please explain to me easily understand. Thank you very much!

+5  A: 

BeautifulSoup is more suitable for html parsing than lxml.

something like this can be helpful:

def get_weather():
    import urllib
    from BeautifulSoup import BeautifulSoup
    data = urllib.urlopen('http://exam.com/').read()
    soup = BeautifulSoup(data)
    return soup.find('p', {'id': 'resultWeather'}).findAll('b')[-1].string

get page contents with urllib, parse it with BeautifulSoup, find P with id=resultWeather, find last B in our P and get it's content

barbuza
Thanks. I already know how to do it. However, still an issue. I want to get the 2nd "b", rather than the last. Parameters to do this?
Tran Tuan Anh
Well? I don't know how exactly it works, but from the last line it is obvious, that you should supply a different list index.
shylent
Try using `.findAll('b')[1]` instead of `.findAll('b')[-1]`.
Dominic Rodger
You can check out pyquery aslo. It has jquery-like syntax and is built ontop of lxml, so it's much faster than BeautifulSoup.
Baresi