views:

187

answers:

4

I need to create a script that extracts some data from a complex Excel 2003 file (with multiple sheets and different tables inside a single sheet) and produces different XML files that need to be validated against a given XSD file.

My preferred language is Python; to create and validate XML files i would go with lxml.
What do you suggest for parsing XLS files?
Is xlrd the right tool to use for complex Excel files?
Or do i need to convert all the sheets in CSV manually, and read files line by line, splitting and getting data?

I accept C#, VB6, VBA suggestions too.

+2  A: 

Xlrd is OK. We use it extensively to import XLS files full of references and formulas with multiple sheets and data presented in custom (not Latin-1) encoding.

nailxx
A: 

I cant answer whether xlrd/python is the right tool for the job - as I don't know python well enough.

But there are many ways to access the excel data...in the main you have VBA built directly in to Excel.

Then you have Ado.net See David Hayden's article here which allows you to access the data via any DotNet language...even IronPython

Adrian
Sheets have more than one tables (headers on different rows), does ADO work also in this case?
systempuntoout
+2  A: 

[disclaimer: I'm the author of xlrd]

xlrd is quite suited for this kind of job. Get the latest version from PyPI. Get the flavour from the tutorial found here. XLSX support is in alpha test; e-mail me if you need it. The awkwardness and lossiness of the save-as-CSV approach was one of the things that prompted me to write xlrd.

John Machin
+1  A: 

I am convinced the most simple solution for this task is using Excel VBA together with MSXML parser. Look here for some links how to use the MSXML parser in VBA for reading XML files; you can adopt this easily for writing XML files, I think.

Doc Brown