I am using BeautifulSoup in Python to parse some HTML. One of the problems I am dealing with is that I have situations where the colspans are different across header rows. (Header rows are the rows that need to be combined to get the column headings in my jargon) That is one column may span a number of columns above or below it and the words need to be appended or prepended based on the spanning. Below is a routine to do this. I use BeautifulSoup to pull the colspans and to pull the contents of each cell in each row. longHeader is the contents of the header row with the most items, spanLong is a list with the colspans of each item in the row. This works but it is not looking very Pythonic.
Alos-it is not going to work if the diff is <0, I can fix that with the same approach I used to get this to work. But before I do I wonder if anyone can quickly look at this and suggest a more Pythonic approach. I am a long time SAS programmer and so I struggle to break the mold-well I will write code as if I am writing a SAS macro.
longHeader=['','','bananas','','','','','','','','','','trains','','planes','','','','']
shortHeader=['','','bunches','','cars','','trucks','','freight','','cargo','','all other','','']
spanShort=[1,1,3,1,3,1,3,1,3,1,3,1,3,1,3]
spanLong=[1,1,3,1,1,1,1,1,1,1,1,1,3,1,3,1,3,1,3]
combinedHeader=[]
sumSpanLong=0
sumSpanShort=0
spanDiff=0
longHeaderCount=0
for each in range(len(shortHeader)):
sumSpanLong=sumSpanLong+spanLong[longHeaderCount]
sumSpanShort=sumSpanShort+spanShort[each]
spanDiff=sumSpanShort-sumSpanLong
if spanDiff==0:
combinedHeader.append([longHeader[longHeaderCount]+' '+shortHeader[each]])
longHeaderCount=longHeaderCount+1
continue
for i in range(0,spanDiff):
combinedHeader.append([longHeader[longHeaderCount]+' '+shortHeader[each]])
longHeaderCount=longHeaderCount+1
sumSpanLong=sumSpanLong+spanLong[longHeaderCount]
spanDiff=sumSpanShort-sumSpanLong
if spanDiff==0:
combinedHeader.append([longHeader[longHeaderCount]+' '+shortHeader[each]])
longHeaderCount=longHeaderCount+1
break
print combinedHeader