views:

47

answers:

1

Hey can someone help with the following?

I'm trying to scrape a site that has the following information.. I need to pull just the number after the </strong> tag..

[<li><strong>ISBN-13:</strong> 9780375853401</li>, <li><strong>Pub. Date: </strong> 05/11/2010</li>]
[<li><strong>UPC:</strong> 490355000372</li>, <li><strong>Catalog No:</strong> 15024/25</li>, <li><strong>Label:</strong> CAMERATA</li>]

here's a piece of the code I've been using to grab the above data using mechanize and BeautifulSoup. I'm stuck here as it won't let me use the find() function for a list

br_results = mechanize.urlopen(br_results)
html = br_results.read()
soup = BeautifulSoup(html)
local_links = soup.findAll("a", {"class" : "down-arrow csa"})
upc_code = soup.findAll("ul", {"class" : "bc-meta3"})
for upc in upc_code:
    upc_text = upc.contents.contents
    print upc_text
+2  A: 

I imagine upc_code is the list you're showing us, and the local_links one has nothing to do with your question right? Given that you don't mention it further in your code...?

So I'm not certain what upc_text would be in your loop's body given that upc is a ul Tag -- upc.contents is going to be a list of li tags (presumably), and I don't see how upc.contents.contents can work -- what are you seeing as a result of that code? I would have expected an exception!

Anyway, the way I'd write the loop would be something like:

for upc in upc_code:
    listitems = upc.findAll('li')
    for anitem in listitems:
        print anitem.contents[1]

since you appear to want the second child of each list item (the first one is the strong tag, the second one the navigable string you want.

If it's not the second child of each list item that you want, please clarify; for example, you could identify the strong and get its next sibling, if that suits you better -- just make the body of the nested loop into

print anitem.find('strong').nextSibling
Alex Martelli
you are right, i hadn't changed that when I posted.. the upc.contents.contents didn't work Cheers!
Diego