I had a problem a week or so ago. Since I think the solution was cool I am sharing it here while I am waiting for an answer to the question I posted earlier. I need to know the relative position for the column headings in a table so I know how to match the column heading up with the data in the rows below. I found some of my tables had the following row as the first row in the table
<!-- Table Width Row -->
<TR style="font-size: 1pt" valign="bottom">
<TD width="60%"> </TD> <!-- colindex=01 type=maindata -->
<TD width="1%"> </TD> <!-- colindex=02 type=gutter -->
<TD width="1%" align="right"> </TD> <!-- colindex=02 type=lead -->
<TD width="9%" align="right"> </TD> <!-- colindex=02 type=body -->
<TD width="1%" align="left"> </TD> <!-- colindex=02 type=hang1 -->
<TD width="3%"> </TD> <!-- colindex=03 type=gutter -->
<TD width="1%" align="right"> </TD> <!-- colindex=03 type=lead -->
<TD width="4%" align="right"> </TD> <!-- colindex=03 type=body -->
<TD width="1%" align="left"> </TD> <!-- colindex=03 type=hang1 -->
<TD width="3%"> </TD> <!-- colindex=04 type=gutter -->
<TD width="1%" align="right"> </TD> <!-- colindex=04 type=lead -->
<TD width="4%" align="right"> </TD> <!-- colindex=04 type=body -->
<TD width="1%" align="left"> </TD> <!-- colindex=04 type=hang1 -->
<TD width="3%"> </TD> <!-- colindex=05 type=gutter -->
<TD width="1%" align="right"> </TD> <!-- colindex=05 type=lead -->
<TD width="5%" align="right"> </TD> <!-- colindex=05 type=body -->
<TD width="1%" align="left"> </TD> <!-- colindex=05 type=hang1 -->
</TR>
I thought wow, this will be easy because the data is in the column below type=body. Counting down I knew that in the data rows I would need to get the values in columns [3, 7, 11, 15]. So I set out to accomplish that using this code:
indexComment = souptoGetColIndex.findAll(text=re.compile("type=body"))
indexRow=indexComment[0].findParent()
indexCells=indexRow.findAll(text=re.compile("type=body"))
for each in range(len(indexCells)):
collist.append(tdlist.index(indexCells[each].previousSibling.previousSibling))
what I got back was collist=[0, 3, 7, 7, 15]
It turns out I think that because cells in the 7th and 11th position looked exactly alike the same index position was returned. I was trying to figure out how to deal with this, clearly I had to make them look different. So what I did was make them look different by first using a readlines to read each line of the file in and change the blank spaces to a random integer.
for each in toGetColIndex:
newlt.append(each.replace(r" ",str(random.randint(1,14567))))
a friend pointed out that I could lower overhead by using this instead
for each in toGetColIndex:
newlt.append(each.replace(r" ",str(toGetColIndex.index(each))))
Nonetheless, each of these approaches gets me a list with the colindex for the location of my headers for each column and to use on the data rows. Note that replace function is missing the blank space since I guess the html is causing it to disappear the actual code uses r"&.n.b.s.p;" without the periods