views:

12

answers:

1

So I have a html page that has a form, and a table inside the form that has rows of products.

I got to the point now where I am looping through the table rows, and in each loop I grab all the table cells.

for tr in t.findAll('tr'):
    td = tr.findAll('td')

Now I want to grab the image src url from the first td.

Html looks like:

<tr>
  <td ...>
    <a href ... >
       <img ... src="asdf/asdf.jpg" .. >
    </a>
  </td>

  ...
</tr>

How would I go about doing this? I keep thinking in terms of regex.

I tried:

td[0].a.image.src but that didn't work as it says no attribute 'src'.

+1  A: 

Use

td[0].a.img['src']

I imagine your use of image for img in the question was just a transcription error, but the important point is that, in BeautifulSoup, in order to access a tag's HTML attributes you use indexing notation (like the ['src'] in my code snippet above), not dot-syntax -- the dot-syntax notation actually proceeds down the tree instead (just as it's doing above for the two dots, one each just before a and img).

Alex Martelli