views:

454

answers:

3

Hi I am trying to develop a script to pull some data from a large number of html tables. One problem is that the number of rows that contain the information to create the column headings is indeterminate. I have discovered that the last row of the set of header rows has the attribute border-bottom for each cell with a value. Thus I decided to find those cells with the attribute border-bottom. As you can see I initialized a list. I intended to find the parent of each of the cells that end up in the borderCells list. However, when I run this code only one cell, that is the first cell in allCells with the attribute border-bottom is added to the list borderCells. For your information allCells has 193 cells, 9 of them have the attr border-bottom. Thus I was expecting nine members in the borderCells list. Any help is appreciated.

borderCells=[]
for each in allCells:
if each.find(attrs={"style": re.compile("border-bottom")}):
 borderCells.append(each)
+2  A: 

Is there any reason

borderCells = soup.findAll("td", style=re.compile("border-bottom")})

wouldn't work? It's kind of hard to figure out exactly what you're asking for, since your description of the original tables is pretty ambiguous, and it's not really clear what allCells is supposed to be either.

I would suggest giving a representative sample of the HTML you're working with, along with the "correct" results pulled from that table.

pantsgolem
A: 

Well you know computers are always right. The answer is that the attrs are on different things in the html. What I was modeling on what some html that looked like this:

<TD nowrap align="left" valign="bottom">
<DIV style="border-bottom: 1px solid #000000; width: 1%; padding-bottom: 1px">
<B>Name</B>
</DIV>
</TD>

The other places in the file where style="border-bottom etc look like:

<TD colspan="2" nowrap align="center" valign="bottom" style="border-bottom: 1px solid 00000">
<B>Location</B>
</TD>

so now I have to modify the question to figure out how identify those cells where the attr is at the td level not the div level

A: 

Someone took away one of their answers though I tested it and it worked for me. Thanks for the help. Both answers worked and I learned a little bit more about how to post questions and after I stare at the code for a while I might learn more about Python and BeautifulSoup