Today I needed to parse some data out from an xlsx file (Office open XML Spreadsheet). I could have just opened the files in openoffice and exported to csv. However I will need to reimport data from this spreadsheet later, and I wanted to eliminate the manual operation.
I searched on the net for xlsx parser, and all I found was a stackoverflow question asking the same thing: http://stackoverflow.com/questions/173246/parsing-and-generating-microsoft-office-2007-files-docx-xlsx-pptx
So I rolled my own.
It's 134 lines of code for the parsing and accessing off a spreadsheet, and 54 lines of code of unit tests. This of course is only tested on the 1 file I needed it, and aside from how it's used in the unit tests there are is no documentation as off now. It uses zipfile, minidom, re and unittest, so perfectly portable and platform independent.
Since I don't blog, and I don't have any desire to turn this into a python library for OfficeOpen XML, I am stuck wondering where I should post this code. I have solved a problem that I am sure others will get in the future. So I want to post my code under public domain somewhere for anyone to copy and paste into their app and adjust to fix their problem.
The implementation is simple, and here is a quick overview off the features:
workbook = Workbook(filename) # open a file
for sheet in workbook: pass # iterate over the worksheets
workbook["sheetname"] # access a sheet by name, also possible to do by index from 0
sheet["A1"] # Access cell
sheet["A"] # Access column
sheet["1"] # Access row
cell.value # Cell value - only tested with ints and strings.
Thanks for all the replies. I was going to hoste it on activestate, but the page kept crashing when sending me the activation mail. So I can't activate my code to post it.
My second choice was codeproject, and I wrote up a nice article about the file. Sadly that page crashes when I try to submit my post.
So I put it on github for any to see and branch off: http://github.com/staale/python-xlsx/tree/master
I don't want to do all the work for the python project hosting, so that's out.
Accepting the git answer, as that was the only thing that worked for me. And git rocks.
Edit: Gah, lost my entire post at codeproject, and I did such a nice writeup. Screw it, I have spent more time trying to share this than it took coding it. So I am calling it done for my part as off now. Unless I decide to tweak it more later.