views:

5068

answers:

11

Is there a simple way to translate an XLS to a CSV formatted file without starting the Excel windowed application?

I need to process some Excel XLS workbooks with scripts. For this i need to convert the xls file into a csv file. This can be done with a save-as from the Excel application. But, i would like to automate this (so, not open the Excel application window).

It will suffice if the first sheet from the workbook gets translated to the CSV format. I need to just process data in that sheet.

I have Cygwin and Excel installed on my system -- if that helps.

Edit: Ok, i have a working solution with Perl. Updating for future use by others.

I installed the Spreadsheet::ParseExcel module. and then used read-excel.pl sample.

My code is a slight variation of this sample code, as below.

#!/usr/bin/perl -w
# For each tab (worksheet) in a file (workbook),
# spit out columns separated by ",",
# and rows separated by c/r.

use Spreadsheet::ParseExcel;
use strict;

my $filename = shift || "Book1.xls";
my $e = new Spreadsheet::ParseExcel;
my $eBook = $e->Parse($filename);
my $sheets = $eBook->{SheetCount};
my ($eSheet, $sheetName);

foreach my $sheet (0 .. $sheets - 1) {
    $eSheet = $eBook->{Worksheet}[$sheet];
    $sheetName = $eSheet->{Name};
    print "#Worksheet $sheet: $sheetName\n";
    next unless (exists ($eSheet->{MaxRow}) and (exists ($eSheet->{MaxCol})));
    foreach my $row ($eSheet->{MinRow} .. $eSheet->{MaxRow}) {
        foreach my $column ($eSheet->{MinCol} .. $eSheet->{MaxCol}) {
            if (defined $eSheet->{Cells}[$row][$column])
            {
                print $eSheet->{Cells}[$row][$column]->Value . ",";
            } else {
                print ",";
            }
        }
        print "\n";
    }
}

Update: Here is a Powershell script that might also be easy to work with; as-is from this MSDN blog and, SO Reference.

$excel = New-Object -comobject Excel.Application
$workbooks = $excel.Workbooks.Open("C:\test.xlsx")
$worksheets = $workbooks.Worksheets
$worksheet = $worksheets.Item(1)
$range = $worksheet.UsedRange
foreach($row in $range.Rows)
{
    foreach($col in $row.Columns)
    {
        echo $col.Text
    }
}

Update: I recently came across a Windows tool CSVed at this Superuser answer which might be useful to some people.

A: 

Recommend Convert XLS: http://www.softinterface.com/Convert-XLS%5CConvert-XLS.htm

‘ Convert XLS ’ is a simple to use, yet sophisticated Excel converter utility specifically designed for Excel, text and CSV (character/comma delimited) files. If you need to convert/manipulate one or thousands of files, located in one or many folders this is the tool! Conversions can be done 10-15 times faster without MS Excel.

Automate your Excel Conversion tasks by using:

  • Built in scheduler
  • The command line
  • interface COM interface
William Leara
+1  A: 

Use one of portable [Python] libraries:

pyxlreader.sourceforge.net/

sourceforge.net/projects/pyexcelerator

and make extra script layer on top of it.

Denis Barmenkov
This is probably a good way to go too. I got the Perl solution working for me fast so i stopped on this.
nik
Congratulations! :)
Denis Barmenkov
+2  A: 

Hi,

Excel can be used as datasource and there are drivers available to access EXCEL as database.

1.) Create and Open a connection to EXCEL file, which you want to convert into CSV.

2.) Fire a query like "SELECT * From Sheet1", which will load all the data of Sheet1 into recordset or datatable.

3.) Since I'm using .net, I can hold those records on datatable and convert into CSV using following extension method.

        public static string ToCSV(this DataTable _dataTable)
        {
            StringBuilder csv = new StringBuilder();
            StringWriter sw = new StringWriter(csv);
            int icolcount = _dataTable.Columns.Count;
            for (int i = 0; i < icolcount; i++)
            {
                sw.Write(_dataTable.Columns[i]);
                if (i < icolcount - 1)
                {
                    sw.Write(",");
                }
            }
            sw.Write(sw.NewLine);
            foreach (DataRow drow in _dataTable.Rows)
            {
                for (int i = 0; i < icolcount; i++)
                {
                    if (!Convert.IsDBNull(drow[i]))
                    {
                        sw.Write(drow[i].ToString());
                    }
                    if (i < icolcount - 1)
                    {
                        sw.Write(",");
                    }
                }
                sw.Write(sw.NewLine);
            }
            sw.Close();
            return csv.ToString();
        }

You can apply this approach on the platform you're working on.

Thanks.

this. __curious_geek
+2  A: 

Use a perl script. Using the Spreadsheet::ParseExcel perl module from CPAN to parse the xls file followed by output as csv should work fine.

http://search.cpan.org/dist/Spreadsheet-ParseExcel

You could also try using VBScript.

PatrickBateman
+2  A: 

In Java world you can use apache poi. You could start from the following Groovy snippet.

FileInputStream fis = new FileInputStream(filename);
Workbook wb = new HSSFWorkbook(fis); 
Sheet sheet = wb.getSheetAt(0);
for (Row row : sheet) {
  for (Cell cell : row) {
    doSomething(cell.toString())
  }

}
jens
A: 

If You have Cygwin, most probablly You will have Python. If not - install python and use this script. It is much more than You need, but will convert easilly and fast.

praavDa
+2  A: 

@unknown (google):

Disclaimer: I'm the author of xlrd (http://pypi.python.org/pypi/xlrd).

Thanks for bringing pyXLreader to my attention; I'd never heard of it before.

I may be gravely mistaken, but it appears not to be have been maintained since 2005, has an astonishing outstanding bug in the sourceforge tracker (omits fractions, all numbers presented as integers) plus a few more problems I found in a few minutes of playing with it (crashes on large files, sometimes prints "FCUKDATE" (no kidding!) when there are dates in the data, time taken is at least 6 times that of xlrd and is possibly O(N**2), memory used is at least 7 times that of xlrd).

Do you use it very much yourself?

John Machin
@John, i am trying to figure out how to deliver your message to our friend "unknown (google)". Thanks for your note.
nik
A: 

@ John Machin: I cant add coment as I newbee for this forum :)

I didn't use old package pyXLreader but make my post with xlrd in mind ;)

I saw it month ago but not used in projects.

WBR

Denis Barmenkov
A: 

iam pretty new to these forums, and it would be nice if they put the year on the time stamps(pet peeve) so i know how old the posts are. iam going to assume they are from 2009.

but a good solution in python is to use xlrd to read in your xls files. here is a pretty simple intro tutorial: http://scienceoss.com/read-excel-files-from-python/ it isn't mine.

the only problem i had was excel dates. here is a quick fix for them:

date = xlrd.xldate_as_tuple( int( sheet.cell( rowNum,colNum ).value ),workBookName.datemode )

than create a csv file with the inbuilt csv module,as iam a new user i can only post one hyper link. but google the csv module api.

hope that was helpfull

You are right about the year. Thanks for the reference.
nik
Python xlrd was pointed out by praavDa on the day I posted this question and some days later the author John himself also added a post.
nik
+3  A: 

You can use xls2csv from the catdoc package if you're on Debian/Ubuntu

Rory
That sounds interesting, I'll check on Ubuntu.
nik
+1, this looks good.
nik
+1 Simplest answer thus far, and known to work.
Benson
A: 

You can use "cz xls2csv" to convert excel files to csv files, and this tool supports command line interface, so you can call it from your application, you can get more information from http://www.convertzone.com/xls2csv/help.htm

If you need batch convert excel files to csv files, "cz-xls2csv" will help you, it can convert many excel xls files at one time.

regards
flyaga

flyaga
You missed stating that this is a tool to be purchased. The demo version can "only convert first two files at one time, run 30 times". This would have been something to look towards if there were no other solutions. Please explain if there are any features of cz-xls2csv that make it a better solution to the existing free ones.
nik
Yes, you are right, it is not free tool, but the folder watch function maybe can help you, this tool can watch source file folder and convert new uploading ms excel documents to text files automatically. While your user upload word documents to your website, then this folder watcher can automatically convert them into csv format.
flyaga