views:

1123

answers:

8

Is there a simple way, using some common Unix scripting language (Perl/Python/Ruby) or command line utility, to convert an Excel Spreadsheet file to CSV? Specifically, this one:

http://www.econ.yale.edu/~shiller/data/ie_data.xls

And specifically the third sheet of that spreadsheet (the first two being charts).

+1  A: 

I may have found an acceptable answer already:

xls2csv

But interested to hear what other options there are, or about tools in other languages.

dreeves
+6  A: 

Maybe xlrd will do the Job (in Python)

edit: I should really learn to read questions. But writing csv shouldn't be a huge problem so maybe you can actually use it.

HerdplattenToni
+1: xlrd works wonderfully.
S.Lott
Right, outputting as CSV is no big deal. Thanks!
dreeves
+1  A: 

For python, there are a number of options, see here, here and here. Note that the last option will only work on Windows with Excel installed.

Al
pyXLreader appears not to be have been maintained since 2005, has an astonishing outstanding bug in the sourceforge tracker (omits fractions, all numbers presented as integers) plus a few more problems I found in a few minutes of playing with it (crashes on large files, sometimes prints "FCUKDATE" (no kidding!) when there are dates in the data, time taken is at least 6 times that of xlrd and is possibly O(N**2), memory used is at least 7 times that of xlrd).Do you use it very much yourself?
John Machin
No. I have used xlrd briefly, but I tend to store data as a CSV and work from that. I had a bookmarked review of those three websites, so I thought it might be useful.
Al
A review? I could't find such a thing using google. Please post the link.
John Machin
+11  A: 

There is a really good Perl library for xls reading: Spreadsheet::ParseExcel.

depesz
+1 Used this myself several years ago; it definitely got the job done.
BlairHippo
+1  A: 

Options exist for all three languages. The question is - which one are you most familiar with. This is the language you should use, for sure. And if you're not familiar with either, this application is not really a great example of picking between the languages.

Opinionated P.S: if you don't know any of the languages, just learn Python and use xlrd.

Eli Bendersky
Oh, I didn't mean it as a Language Wars question or anything. Just needed to get that task done and was happy to use whatever language made it easy.
dreeves
+3  A: 

You can use pyexcelerator in python.

This code (included in the examples folder of pyexcelerator as xls2csv.py) extracts all sheets from the spreadsheets and outputs them to stdout as CSV.

You can easily change the code to do what you want.

The cool thing about pyexcelerator is that you can also use it to write/create excel xls files, without having excel installed.

#!/usr/bin/env python
# -*- coding: windows-1251 -*-
# Copyright (C) 2005 Kiseliov Roman

__rev_id__ = """$Id: xls2csv.py,v 1.1 2005/05/19 09:27:42 rvk Exp $"""


from pyExcelerator import *
import sys

me, args = sys.argv[0], sys.argv[1:]


if args:
    for arg in args:
        print >>sys.stderr, 'extracting data from', arg
        for sheet_name, values in parse_xls(arg, 'cp1251'): # parse_xls(arg) -- default encoding
            matrix = [[]]
            print 'Sheet = "%s"' % sheet_name.encode('cp866', 'backslashreplace')
            print '----------------'
            for row_idx, col_idx in sorted(values.keys()):
                v = values[(row_idx, col_idx)]
                if isinstance(v, unicode):
                    v = v.encode('cp866', 'backslashreplace')
                else:
                    v = str(v)
                last_row, last_col = len(matrix), len(matrix[-1])
                while last_row < row_idx:
                    matrix.extend([[]])
                    last_row = len(matrix)

                while last_col < col_idx:
                    matrix[-1].extend([''])
                    last_col = len(matrix[-1])

                matrix[-1].extend([v])

            for row in matrix:
                csv_row = ','.join(row)
                print csv_row

else:
    print 'usage: %s (inputfile)+' % me
nosklo
What are those 'cp1251' and 'cp866' doing in there? What will happen if there are dates in the data? Why does it use alist.extend([something]) instead of alist.append(something)? Why does it use ','.join(row) instead of using the csv module?? Are you serious???
John Machin
@John: I just copied the example of pyexcelerator. I think the point of the example is reading the xls file, not the csv generation part. It works. You can modify and improve it as needed.
nosklo
@nosklo: It **doesn't work** if you have dates in your data. In general using pyExcelerator to write xls files is rather **uncool**. It is not being maintained regularly. Releases prior to the brief maintenance resurrection around Easter 2009 have bugs. Use xlwt (a maintained fork) to write xls files. Use xlrd to read xls files. See `http://www.python-excel.org`
John Machin
+2  A: 

In Ruby, here is the code I use: (requires the excellent parseexcel gem) require 'parseexcel'

def excelGetSheet(worksheet)
 sheet=Array.new
 worksheet.each { |row|
   if row != nil   # empty row?
     cells=Array.new
     j=0
     row.each { |cell|
       cells << cell.to_s('latin1')  unless cell == nil
       j=j+1
     }
     sheet << cells
   end
 }
 return sheet
end

workbook = Spreadsheet::ParseExcel.parse("MyExcelFile.xls")
sheet1 = excelGetSheet(workbook.worksheet(0))

puts sheet1.inspect
Jasim
more explanations is also found here: http://www.codeweblog.com/parseexcel-ruby-used-to-read-excel/
poseid
+2  A: 

for ruby, the spreadsheet gem is excellent to read write modify, ...excell files

http://spreadsheet.rubyforge.org

Alexis Perrier