(0) You asked "Why is win32com so much slower than xlrd?" ... this question is a bit like "Have you stopped beating your wife?" --- it is based on a presupposition that may not be true; win32com was written in C by a brilliant programmer, but xlrd was written in pure Python by an average programmer. The real difference is that win32com has to call COM which involves inter-process communication and was written by you-know-who, whereas xlrd is reading the Excel file directly. Moreover, there's a fourth party in the scenario: YOU. Please read on.
(1) You don't show us the source of the find_last_col()
function that you use repetitively in the COM code. In the xlrd code, you are happy to use the same value (ws.ncols) all the time. So in the COM code, you should call find_last_col(ws)
ONCE and thereafter used the returned result. Update See answer to your separate question on how to get the equivalent of xlrd's Sheet.ncols
from COM.
(2) Accessing each cell value TWICE is slowing down both codes. Instead of
if ws.cell_value(6, cnum):
wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, ws.ncols)
try
value = ws.cell_value(6, cnum)
if value:
wsHeaders[str(value)] = (cnum, ws.ncols)
Note: there are 2 cases of this in each code snippet.
(3) It is not at all apparent what the purpose of your nested loops are, but there does seem to be some redundant computation, involving redundant fetches from COM. If you care to tell us what you are trying to achieve, with examples, we could be able to help you make it run much faster. At the very least, extracting the values from COM once then processing them in nested loops in Python should be faster. How many columns are there?
Update 2 Meanwhile the little elves took to your code with the proctoscope, and came up with the following script:
tests= [
"A/B/C/D",
"A//C//",
"A//C//E",
"A///D",
"///D",
]
for test in tests:
print "\nTest:", test
row = test.split("/")
ncols = len(row)
# modelling the OP's code
# (using xlrd-style 0-relative column indexes)
d = {}
for cnum in xrange(ncols):
if row[cnum]:
k = row[cnum]
v = (cnum, ncols) #### BUG; should be ncols - 1 ("inclusive")
print "outer", cnum, k, '=>', v
d[k] = v
for cend in xrange(cnum + 1, ncols):
if row[cend]:
k = row[cnum]
v = (cnum, cend - 1)
print "inner", cnum, cend, k, '=>', v
d[k] = v
break
print d
# modelling a slightly better algorithm
d = {}
prev = None
for cnum in xrange(ncols):
key = row[cnum]
if key:
d[key] = [cnum, cnum]
prev = key
elif prev:
d[prev][1] = cnum
print d
# if tuples are really needed (can't imagine why)
for k in d:
d[k] = tuple(d[k])
print d
which outputs this:
Test: A/B/C/D
outer 0 A => (0, 4)
inner 0 1 A => (0, 0)
outer 1 B => (1, 4)
inner 1 2 B => (1, 1)
outer 2 C => (2, 4)
inner 2 3 C => (2, 2)
outer 3 D => (3, 4)
{'A': (0, 0), 'C': (2, 2), 'B': (1, 1), 'D': (3, 4)}
{'A': [0, 0], 'C': [2, 2], 'B': [1, 1], 'D': [3, 3]}
{'A': (0, 0), 'C': (2, 2), 'B': (1, 1), 'D': (3, 3)}
Test: A//C//
outer 0 A => (0, 5)
inner 0 2 A => (0, 1)
outer 2 C => (2, 5)
{'A': (0, 1), 'C': (2, 5)}
{'A': [0, 1], 'C': [2, 4]}
{'A': (0, 1), 'C': (2, 4)}
Test: A//C//E
outer 0 A => (0, 5)
inner 0 2 A => (0, 1)
outer 2 C => (2, 5)
inner 2 4 C => (2, 3)
outer 4 E => (4, 5)
{'A': (0, 1), 'C': (2, 3), 'E': (4, 5)}
{'A': [0, 1], 'C': [2, 3], 'E': [4, 4]}
{'A': (0, 1), 'C': (2, 3), 'E': (4, 4)}
Test: A///D
outer 0 A => (0, 4)
inner 0 3 A => (0, 2)
outer 3 D => (3, 4)
{'A': (0, 2), 'D': (3, 4)}
{'A': [0, 2], 'D': [3, 3]}
{'A': (0, 2), 'D': (3, 3)}
Test: ///D
outer 3 D => (3, 4)
{'D': (3, 4)}
{'D': [3, 3]}
{'D': (3, 3)}