ansaurus

Question

Can I segment a document in BeautifulSoup before converting it to text based on my analysis of the document?

Answer 1

A:

Man I love this stuff Assuming in a naive case that I want to delete all of the tables that have any rows with a column length greater than 3 My answer is

for table in soup.findAll('table'):
    rows=[]
    for row in table.findAll('tr'):
        columns=0
        for column in row.findAll('td'):
            columns+=1
            rows.append(columns)
        if max(rows)>3:
          table.delete()

You can do any processing you want at any level in that loop, it is only necessary to identify the test and get the right instance to test.

PyNEwbie 2009-05-16 05:00:33

ansaurus

tags:

views:

answers:

Can I segment a document in BeautifulSoup before converting it to text based on my analysis of the document?

related questions