views:

848

answers:

7

I'm trying to evaluate various techniques and libraries in an attempt to find a way to read Excel spreadsheets when running as a service on an NT server. So basically I'm looking for any methods/libraries that people know of that I can use.

I need to be able to read .xls .xlsx .xlsm and .xlsb files.

If anyone knows of anything else that might fit the bill, please share :)

Thanks

Microsoft on Office Automation

+1  A: 

I am not sure why the ad on SO is not appearing in relevance to your question.

There is something similar called Excel Services from microsoft. I suppose, this lets you use Excel in a webservice style. I think SpreadSheetGear is what microsoft used, as the ad used to display. :)

shahkalpesh
+1  A: 

Excel.

After of years of trying to stop folks from using Excel on the server, they've given up/embraced the market need and have started to support this.

2007 has some impovement for this, and Excel 2010 is supposed to have even more.

Scott Weinstein
Scott, we have to support Excel 2003 at least (possibly 2000). At least one of the versions we need to support does not run on a Vista server. So doesn't really help at all, if we could use Excel we would.
Ian
+3  A: 

I have used ADO.NET get data out of an xls before. I'm not sure all the Excel Doc types this would support but check out Reading and Writing Excel Spreadsheets Using ADO.NET C# DbProviderFactory

Here is some code from a SO question as well.

I was able to read and write to excel without having to install Office or any 3rd party tools.

Jeff Alexander
ADO.NET actually works REALLY well for this with no need to have Excel installed at all...
consultutah
Unfortunately ADO won't work for what we need. I'm not sure on the details but another dev hasn't been able to get it working exactly how we want so ADO is a no go. Thanks for the post though.
Ian
If you can get more info, we can probably help you through the issues you are seeing with ADO.NET.
consultutah
The OLEDB/ODBC interface for Excel doesn't support SpreadsheetML or OpenXML formats, has problems with sheets and named ranges not matching certain naming conventions, and can result in truncation if the first few rows of a column do *not* include data whose length is greater than 255 characters, follows by rows that *do* have longer strings in those columns. Those are just a few of the problems I remember from my days of doing this. Now I export and import only SpreadsheetML--slight training issue for users, but huge benefits in application stability.
richardtallent
+1  A: 

Here is something that can read .xls and xlsx files. It is open source. http://code.google.com/p/excelmapper/.

It is basically an ORM, so you have to just deal with objects instead of OleDB.

I discovered that actually after posting this. Haven't had a chance to look into it yet but thanks.
Ian
Unfortunately I don't think that's the sort of thing I want to do. Looking at the Excel mapper it seems that it only imports into objects with a given structure, therefore there are lots of strongly typed fields to population. My data is of an unknown type at the start, I then need to read Excel and determine types from that, so I don't think this is suited to the task.
Ian
Ian, ExcelMapper download includes an exe (BuildDTOsFromExcel) that you can run on the excel files to generate the strongly typed objects. It will figure out the data types based on the column data.As you mentioned, the problem here is that you will have to run this every time you change the data type, so in a way it has to know the data type before hand.
That is fair enough, however we already have a generic system in place for doing that, and need to read the RAW data which I maybe should have suggested in my original post.
Ian
+1  A: 

Had you tried SmartXLS for .net?

or ActiveXLS?

liya
No I hadn't. Thanks for the suggestion.
Ian
+7  A: 

These are the current possible solutions from this post so far and a few others I found for anyone else looking:

SpreadsheetGear.Net
(Didn't evaluate due to high purchase cost)

Aspose.Cells
(Evaluated by a collegue. Appeared to be fairly simple to implement, performance comparable to Excel Interop).

GemBox
(Didn't evaluate)

Excel Services
(Seems only to be included in SharePoint 2007)

Excel Mapper (Didn't evaluate because it requires strongly typed objects to import into which didn't fit my requirement).

SmartXls
(Didn't evaluate because it requires strongly typed objects to import into which didn't fit my requirement).

ActiveXls
(Fairly easy to use, lack of Properties raises questions, they have a preference of Methods for trivial actions. Despite it's claim of 1M records a second was out performed by cheaper FlexCel. Have decided that the help/API manual is almost useless.)

Koogra
(Didn't evaluate due to finding no documentations/information)

FileHelpers
(Didn't evaluate)

Flexcel
(Lowest cost solution found, good performance and was simple to implement with a close proximity to Excel Interop structure. Also received quick response to technical question from support. Probably my pick of the bunch.)

SyncFusion BackOffice
(Medium cost and had a reasonable structure. Unfortunately had more difficulty implementing and inconsistent results when running unit tests. Also received a number of 'Attempted to read protected memory' errors, which didn't encourage me with purely managed library.)

Ian
Excellent work Ian. You may want to highlight Flexcel since it's your chosen option. It looks very decent.
Mark Nold
+1  A: 

To read Excel files with C# without installing Excel you could use Apache's POI.

It's a java library for reading and writing MS Office formats. Since it's java and you're using C# you'll need IKVM and the java classes from the POI Project.

However, the easiest way is to just download Jon Iles excelent MPXJ project and you've got it all. Just set a reference to IKVM.OpendJDK.ClassLibrary.dll, IKVM.Runtime.dll, poi-3.2-FINAL-20081019.dll

I've hacked together a quick console app to show you an simple way to read an Excel .xls sheet. It only reads the first sheet and doesn't use the row or cell iterators, but it does the job well.

NB: You'll need to look at org.apache.poi.ss to make sure you can read .xlsx etc files.

//C# code for using the Apache POI libraries
using System;
using System.Collections.Generic;
using System.Text;


// poi for xls
using org.apache.poi;
using org.apache.poi.poifs;
using org.apache.poi.poifs.filesystem;
using org.apache.poi.hssf;
using org.apache.poi.hssf.usermodel;
using org.apache.poi.ss;

namespace ConsoleApplication1
{
    class Test
    {
        static void Main(string[] args)
        {

            if (args.Length != 1)
            {
                Console.Out.WriteLine("Usage: XLSReadTest <xls file>");
            }
            else
            {
                XLSRead x = new XLSRead();
                x.Process(args[0]);
                //x.Process("c:\\temp\\testfile.xls");
            }




        }
    }


    class XLSRead
    {
        public void Process(string inputFile)
        {


            int r = 0;


            Console.Out.WriteLine("Reading input file started.");
            DateTime start = DateTime.Now;

            java.io.InputStream inputStream = new java.io.FileInputStream(inputFile);
            POIFSFileSystem fs = new POIFSFileSystem(inputStream);

            HSSFWorkbook wb = new HSSFWorkbook(fs);
            HSSFSheet sh = wb.getSheetAt(0);


            r = sh.getFirstRowNum();
            while (r <= sh.getLastRowNum())
            {
                HSSFRow row = sh.getRow(r);
                int c = row.getFirstCellNum();
                string val = "";

                while (c < row.getLastCellNum())
                {
                    HSSFCell cell = row.getCell(c);
                    switch(cell.getCellType())
                    {
                      case HSSFCell.CELL_TYPE_NUMERIC:
                          val = cell.getNumericCellValue().ToString();
                          break;
                      case HSSFCell.CELL_TYPE_STRING:
                          val = cell.getStringCellValue();
                          break;
                    }
                    Console.Out.WriteLine("Row: " + r + ", Cell: " + c + " = " + val);
                    c++;
                }
                r++;
            }

            long elapsed = DateTime.Now.Ticks - start.Ticks;
            String seconds = String.Format("{0:n}", elapsed / 1000000);
            Console.Out.WriteLine("\r\n\r\nReading input file completed in " + seconds + "s." + "\r\n");



        }
    }
}
Mark Nold
Mark, I'll add to the list of readers. Honestly for the project I'm working on we probably wouldn't want to adopt that route because there are more libraries involved which leads to more maintenance etc. We need something simple to implement/maintain at a reasonable cost that is royalty free. These are some of the things I've been assessing possibilities on and I think this might be a little too involved.
Ian