tags:

views:

4717

answers:

6

My issue is below but would be interested comments from anyone with experience with xlrd.

I just found xlrd and it looks like the perfect solution but I'm having a little problem getting started. I am attempting to extract data programatically from an Excel file I pulled from Dow Jones with current components of the Dow Jones Industrial Average (link: http://www.djindexes.com/mdsidx/?event=showAverages)

When I open the file unmodified I get a nasty BIFF error (binary format not recognized)

However you can see in this screenshot that Excel 2008 for Mac thinks it is in 'Excel 1997-2004' format (screenshot: http://skitch.com/alok/ssa3/componentreport-dji.xls-properties)

If I instead open it in Excel manually and save as 'Excel 1997-2004' format explicitly, then open in python usig xlrd, everything is wonderful. Remember, Office thinks the file is already in 'Excel 1997-2004' format. All files are .xls

Here is a pastebin of an ipython session replicating the issue: http://pastie.textmate.org/private/jbawdtrvlrruh88mzueqdq

Any thoughts on: How to trick xlrd into recognizing the file so I can extract data? How to use python to automate the explicit 'save as' format to one that xlrd will accept? Plan B?

A: 

Well here is some code that I did: (look down the bottom): here

Not sure about the newer formats - if xlrd can't read it, xlrd needs to have a new version released !

Michael Neale
A: 

Do you have to use xlrd? I just downloaded 'UPDATED - Dow Jones Industrial Average Movers - 2008' from that website and had no trouble reading it with pyExcelerator.

import pyExcelerator
book = pyExcelerator.parse_xls('DJIAMovers.xls')
John Fouhy
nope not married to xlrd. going to check out pyExcelerator now ...
There doesn't seem to be much documentation available for pyExcelerator. Could you maybe show me the basics of opening a file and extracting data?
+2  A: 

More info on pyExcelerator: To read a file, do this:

import pyExcelerator
book = pyExcelerator.parse_xls(filename)

where filename is a string that is the filename to read (not a file-like object). This will give you a data structure representing the workbook: a list of pairs, where the first element of the pair is the worksheet name and the second element is the worksheet data.

The worksheet data is a dictionary, where the keys are (row, col) pairs (starting with 0) and the values are the cell contents -- generally int, float, or string. So, for instance, in the simple case of all the data being on the first worksheet:

data = book[0][1]
print 'Cell A1 of worksheet %s is: %s' % (book[0][0], repr(data[(0, 0)]))

If the cell is empty, you'll get a KeyError. If you're dealing with dates, they may (I forget) come through as integers or floats; if this is the case, you'll need to convert. Basically the rule is: datetime.datetime(1899, 12, 31) + datetime.timedelta(days=n) but that might be off by 1 or 2 (because Excel treats 1900 as a leap-year for compatibility with Lotus, and because I can't remember if 1900-1-1 is 0 or 1), so do some trial-and-error to check. Datetimes are stored as floats, I think (days and fractions of a day).

I think there is partial support for forumulas, but I wouldn't guarantee anything.

John Fouhy
pyExcelerator has a number of known bugs - the 'xlrd' fork is compatible and maintained.
nailer
+2  A: 

xlrd support for Office 2007/2008 (OpenXML) format is in alpha test - see the following post in the python-excel newsgroup: http://groups.google.com/group/python-excel/msg/0c5f15ad122bf24b?hl=en

msanders
+15  A: 
John Machin
A: 

I have a problem to read unicode text in the Excel files. Is there version of xlrd for Python 3.1 (I work with unicode text in 31 wthout any problems.

Goki
@Goki: No there's not a version of xlrd for Python 3.1 yet. Don't hijack somebody else's question. I suggest that you join the group at http://groups.google.com/group/python-excel and ask there, with a bit more detail on what "problem to read unicode text in the Excel files" means, with a small sample file.
John Machin