ansaurus

Question

How do you get all the rows from a particular table using BeautifulSoup?

Answer 1

+5 A:

This should be pretty straight forward if you have a chunk of HTML to parse with BeautifulSoup. The general idea is to navigate to your table using the findChildren method, then you can get the text value inside the cell with the string property.

>>> from BeautifulSoup import BeautifulSoup
>>> 
>>> html = """
... <html>
... <body>
...     <table>
...         <th><td>column 1</td><td>column 2</td></th>
...         <tr><td>value 1</td><td>value 2</td></tr>
...     </table>
... </body>
... </html>
... """
>>>
>>> soup = BeautifulSoup(html)
>>> tables = soup.findChildren('table')
>>>
>>> # This will get the first (and only) table. Your page may have more.
>>> my_table = tables[0]
>>>
>>> # You can find children with multiple tags by passing a list of strings
>>> rows = my_table.findChildren(['th', 'tr'])
>>>
>>> for row in rows:
...     cells = row.findChildren('td')
...     for cell in cells:
...         value = cell.string
...         print "The value in this cell is %s" % value
... 
The value in this cell is column 1
The value in this cell is column 2
The value in this cell is value 1
The value in this cell is value 2
>>>

jgeewax 2010-01-06 02:03:25

That was the trick! Code worked and I should be able to modify it as needed. Many thanks. One last question. I can follow the code except for when you search the table for children th and tr. Is that simply searching my table and returning both the table header and table rows? If I only wanted the table rows, I simply could search for tr only? many thanks again!

Btibert3 2010-01-06 02:19:18

Yes, `.findChildren(['th', 'tr'])` is searching for elements with tag type of `th` or `tr`. If you just want to find `tr` elements you would use `.findChildren('tr')` (note not a list, just the string)

jgeewax 2010-01-08 22:15:51

ansaurus

tags:

views:

answers:

How do you get all the rows from a particular table using BeautifulSoup?

related questions