I have an html table, and I would like to remove a column. What is the easiest way to do this with BeautifulSoup or any other python library?
+2
A:
lxml.html is nicer for manipulating HTML, IMO. Here's some code that will remove the second column of an HTML table.
from lxml import html
text = """
<table>
<tr><th>head 1</th><th>head 2</th><th>head 3</th></tr>
<tr><td>item 1</td><td>item 2</td><td>item 3</td></tr>
</table>
"""
table = html.fragment_fromstring(text)
# remove middle column
for row in table.iterchildren():
row.remove(row.getchildren()[1])
print html.tostring(table, pretty_print=True)
Result:
<table>
<tr>
<th>head 1</th>
<th>head 3</th>
</tr>
<tr>
<td>item 1</td>
<td>item 3</td>
</tr>
</table>
Ryan Ginstrom
2010-03-04 02:47:59
Thanks for the response. Unfortuantely, the version of lxml I had didn't support fragment_fromstring, and the codespeak servers were down, so I couldn't update. I ended up just using beautifulsoup because it turned out every cell in that column had a special class on it, so it was easy to delete via the class name.
jedberg
2010-03-08 21:28:25