ansaurus

Question

Why is win32com so much slower than xlrd?

Answer 1

A:

COM requires talking to another process which actually handles the requests. xlrd works in-process on the data structures themselves.

Ignacio Vazquez-Abrams 2010-06-03 19:48:17

so is it impossible to do this in a reasonable amount of time, with win32com?

Josh 2010-06-03 22:26:48

Answer 2

+8 A:

(0) You asked "Why is win32com so much slower than xlrd?" ... this question is a bit like "Have you stopped beating your wife?" --- it is based on a presupposition that may not be true; win32com was written in C by a brilliant programmer, but xlrd was written in pure Python by an average programmer. The real difference is that win32com has to call COM which involves inter-process communication and was written by you-know-who, whereas xlrd is reading the Excel file directly. Moreover, there's a fourth party in the scenario: YOU. Please read on.

(1) You don't show us the source of the find_last_col() function that you use repetitively in the COM code. In the xlrd code, you are happy to use the same value (ws.ncols) all the time. So in the COM code, you should call find_last_col(ws) ONCE and thereafter used the returned result. Update See answer to your separate question on how to get the equivalent of xlrd's Sheet.ncols from COM.

(2) Accessing each cell value TWICE is slowing down both codes. Instead of

if ws.cell_value(6, cnum):
    wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, ws.ncols)

try

value = ws.cell_value(6, cnum)
if value:
    wsHeaders[str(value)] = (cnum, ws.ncols)

Note: there are 2 cases of this in each code snippet.

(3) It is not at all apparent what the purpose of your nested loops are, but there does seem to be some redundant computation, involving redundant fetches from COM. If you care to tell us what you are trying to achieve, with examples, we could be able to help you make it run much faster. At the very least, extracting the values from COM once then processing them in nested loops in Python should be faster. How many columns are there?

Update 2 Meanwhile the little elves took to your code with the proctoscope, and came up with the following script:

tests= [
    "A/B/C/D",
    "A//C//",
    "A//C//E",
    "A///D",
    "///D",
    ]
for test in tests:
    print "\nTest:", test
    row = test.split("/")
    ncols = len(row)
    # modelling the OP's code
    # (using xlrd-style 0-relative column indexes)
    d = {}
    for cnum in xrange(ncols):
        if row[cnum]:
            k = row[cnum]
            v = (cnum, ncols) #### BUG; should be ncols - 1 ("inclusive")
            print "outer", cnum, k, '=>', v
            d[k] = v
            for cend in xrange(cnum + 1, ncols):
                if row[cend]:
                    k = row[cnum]
                    v = (cnum, cend - 1)
                    print "inner", cnum, cend, k, '=>', v
                    d[k] = v
                    break
    print d
    # modelling a slightly better algorithm
    d = {}
    prev = None
    for cnum in xrange(ncols):
        key = row[cnum]
        if key:
            d[key] = [cnum, cnum]
            prev = key
        elif prev:
            d[prev][1] = cnum
    print d
    # if tuples are really needed (can't imagine why)
    for k in d:
        d[k] = tuple(d[k])
    print d

which outputs this:

Test: A/B/C/D
outer 0 A => (0, 4)
inner 0 1 A => (0, 0)
outer 1 B => (1, 4)
inner 1 2 B => (1, 1)
outer 2 C => (2, 4)
inner 2 3 C => (2, 2)
outer 3 D => (3, 4)
{'A': (0, 0), 'C': (2, 2), 'B': (1, 1), 'D': (3, 4)}
{'A': [0, 0], 'C': [2, 2], 'B': [1, 1], 'D': [3, 3]}
{'A': (0, 0), 'C': (2, 2), 'B': (1, 1), 'D': (3, 3)}

Test: A//C//
outer 0 A => (0, 5)
inner 0 2 A => (0, 1)
outer 2 C => (2, 5)
{'A': (0, 1), 'C': (2, 5)}
{'A': [0, 1], 'C': [2, 4]}
{'A': (0, 1), 'C': (2, 4)}

Test: A//C//E
outer 0 A => (0, 5)
inner 0 2 A => (0, 1)
outer 2 C => (2, 5)
inner 2 4 C => (2, 3)
outer 4 E => (4, 5)
{'A': (0, 1), 'C': (2, 3), 'E': (4, 5)}
{'A': [0, 1], 'C': [2, 3], 'E': [4, 4]}
{'A': (0, 1), 'C': (2, 3), 'E': (4, 4)}

Test: A///D
outer 0 A => (0, 4)
inner 0 3 A => (0, 2)
outer 3 D => (3, 4)
{'A': (0, 2), 'D': (3, 4)}
{'A': [0, 2], 'D': [3, 3]}
{'A': (0, 2), 'D': (3, 3)}

Test: ///D
outer 3 D => (3, 4)
{'D': (3, 4)}
{'D': [3, 3]}
{'D': (3, 3)}

John Machin 2010-06-03 23:46:22

+1 I agree - the slowdown is not because of COM but because of the OP's useage of it that makes it seem more slow.

Cam 2010-06-04 01:55:36

There are also ways to read and write multiple values to/from Excel using COM via safearrays, precisely because of the IPC overhead. You can do this transparently via win32com, by specifying ranges instead of individual cells.

Ryan Ginstrom 2010-06-04 03:38:46

Answer 3

A:

Thought about it as I was going to bed last night, and ended up using this. A far superior version to my original:

def makeDict(ws):
"""makes dict with key as header name, 
   value as tuple of column begin and column end (inclusive)"""
wsHeaders = {} # key is header name, value is column begin and end inclusive
last_col = find_last_col(ws)

for cnum in xrange(9, last_col):
    if ws.Cells(7, cnum).Value:
        value = ws.Cells(7, cnum).Value
        cstart = cnum
    if ws.Cells(7, cnum + 1).Value:
        wsHeaders[str(value)] = (cstart, cnum) #cnum is last in range
return wsHeaders

Josh 2010-06-04 14:48:34

(1) You are still accessing the same cell twice (2) Previous code hardcoded 8th column and 6th row; this one uses 9 and 7 respectively; why?

John Machin 2010-06-04 21:51:22

ansaurus

tags:

views:

answers:

Why is win32com so much slower than xlrd?

related questions