views:

173

answers:

2

I've got a problem using pyExcelerator when reading some xls-files.

There're some python scripts i wrote, that use this library to parse XLS-files and populate database with info.

The templates for the files these scripts parse may vary and i sometimes reconfigure the script to handle them. With the one of the templates i ran into problem: pyExcelerator just raises an exception:

Traceback (most recent call last):
 File "/home/* * */parsexls.py",
line 64, in handle_label
   parser.parse()
 File "/home/* * */parsers.py", line 335, in parse
   self.contents = pyExcelerator.parse_xls(self.file_record.file,
self.encoding)
 File "/usr/local/lib/python2.6/dist-packages/pyExcelerator/ImportXLS.py",
line 327, in parse_xls
   ole_streams = CompoundDoc.Reader(filename).STREAMS
 File "/usr/local/lib/python2.6/dist-packages/pyExcelerator/CompoundDoc.py",
line 67, in __init__
   self.__build_short_sectors_data()
 File "/usr/local/lib/python2.6/dist-packages/pyExcelerator/CompoundDoc.py",
line 256, in __build_short_sectors_data
   dentry_start_sid, stream_size) = self.dir_entry_list[0]
IndexError: list index out of range

Some of the problem XLS-files contained empty sheets and removing of these sheets helped, but many of the files can't be handled even without empty sheets. There's nothing extraordinary in these files and they contain no formulas or pictures - just strings, numbers and dates.

As i can see, the pyExcelerator is abandoned by it's author :(

Any suggestions on fixing this issue are much appreciated.

+1  A: 

You might wish to give xlrd a try... it started (I believe) as a fork of pyExcelerator, so incorporating requires few code changes, but it is actively maintained:

http://pypi.python.org/pypi/xlrd

Project website

General info, release notes and history from the documentation

Jarret Hardie
Thank you, i'll try that out.
DataGreed
+2  A: 

Hi, I'm the author of xlrd. It rea**d**s XLS files and is not a fork of anything. I maintain a package called xlwt which wri**t**es XLS files and is a fork of pyExcelerator. The parse_xls functionality in pyExcelerator was deprecated to the point of removal from xlwt. Use xlrd instead.

Given the traceback that you reproduced, it looks like the file may be corrupted. What it is doing there happens well before the sheet data is parsed. What software produces these files? Can you open them with Excel or OpenOffice.org's Calc or Gnumeric? xlrd may give you a more meaningful error message. You may like to send me (insert_punctuation('sjmachin', 'lexicon', 'net')) copies of your failing file(s); please include some with and some without empty sheets. By the way, what are you using to remove empty sheets? What error message do you get from pyExcelerator when processing files with empty sheets?

John Machin
Wow, thank you for your response, John. I've already found that xlrd seems to read these files normally without any problems. The files are generated via Microsoft Office 2003 (XP?), but since xlrd seems to read them, I suppose, there is nothing to worry about now. Thank you. And thank you for the great libraries.
DataGreed
Thanks for clarifying my mistaken assumption about xlrd vs xlwt, and thank you for the project as well... it's a very useful library.
Jarret Hardie
@DataGreed: I would have expected pyExcelerator to fall over at the OLE Compound Document level only if presented with buggy files created by 3rd party software. The error message was indicative of an empty internal directory i.e. a broken file. I don't expect broken files from Excel 2003. Notwithstanding the fact that xlrd seems to read them, I would very much like to have a copy of such a file, so that I can find out the nature of the problem, and ensure that xlrd has a principled fix and is not reading the file OK (seemingly) by accident.
John Machin
@DataGreed: Hello, hello, I'd really like to investigate this problem. Please contact me by private e-mail.
John Machin