tags:

views:

43

answers:

1

I'm scraping a page with Python's pyquery, and I'm kinda confused by the types it returns, and in particular how to iterate over a list of results.

If my HTML looks a bit like this:

<div class="formwrap">blah blah <h3>Something interesting</h3></div>
<div class="formwrap">more rubbish <h3>Something else interesting</h3></div>

How do I get the inside of the <h3> tags, one by one so I can process them? I'm trying:

results_page = pq(response.read())
formwraps = results_page(".formwrap") 
print type(formwraps)
print type([formwraps])
for my_div in [formwraps]:
    print type(my_div)
    print my_div("h3").text() 

This produces:

<class 'pyquery.pyquery.PyQuery'>
<type 'list'>
<class 'pyquery.pyquery.PyQuery'>
Something interesting something else interesting

It looks like there's no actual iteration going on. How can I pull out each element individually?

Extra question from a newbie: what are the square brackets around [a] doing? It looks like it converts a special Pyquery object to a list. Is [] a standard Python operator?

------UPDATE--------

I've found an 'each' function in the pyquery docs. However, I don't understand how to use it for what I want. Say I just want to print out the content of the <h3>. This produces a syntax error: why?

formwraps.each(lambda e: print e("h3").text())
A: 

I've never used pyquery, however the source of the syntax error is that lambdas in Python are kind of limited, you can only use one expresion inside (so no statements like print). You can circumvent this limitation using a function, e.g:

def my_print(x):
    print x

formwraps.each(lambda e: my_print(e("h3").text()))
diegogs