tags:

views:

29

answers:

2

I am trying to strip data from thousands of identical Excel 2007/2010 files. I would prefer to do this using scraping techniques. Is it possible to scrape an Excel file since, as far as I know, the file is basically some sort of XML format.

So, is it possible to convert an Excel file to XML or some other markup format?

A: 

Excel 2010 files are in XML, by default. So what file format are your Excel files currently in (i.e., what extension do they have)? Your question is somewhat ambiguous on this matter. If they are already in XML, you can use XSLT to scrape them.

Michael Goldshteyn
They are in XLSX; so I am just inquiring as to how I would convert them from the common worksheet format to the XML markup. A few years ago, I remember clicking a button in Excel that enabled me to see the markup instead of the regular interface.
ooutwire
A: 

The XLSX format is actually a ZIP file, but with a different extension. If you unzip it using your favorite zip program, you'll find that the worksheet data is located inside xl\worksheets. Each worksheet is saved as a separate XML document. You should be able to use XSLT as Michael suggested to extract the data you require.

Colin O'Dell
perfect! that solved my problem exactly
ooutwire