views:

30581

answers:

27

Is there a free or open source library to read Excel files (.xls) directly from a C# program?

It does not need to be too fancy, just to select a worksheet and read the data as strings. So far, I've been using Export to Unicode text function of Excel, and parsing the resulting (tab-delimited) file, but I'd like to eliminate the manual step.

A: 

you could write an excel spreadsheet that loads a given excel spreadsheet and saves it as csv (rather than doing it manually).

then you could automate that from c#.

and once its in csv, the c# program can grok that.

(also, if someone asks you to program in excel, it's best to pretend you don't know how)

(edit: ah yes, rob and ryan are both right)

Leon Bambrick
A: 

I know that people have been making an Excel "extension" for this purpose.
You more or less make a button in Excel that says "Export to Program X", and then export and send off the data in a format the program can read.

http://msdn.microsoft.com/en-us/library/ms186213.aspx should be a good place to start.

Good luck

Lars Mæhlum
+2  A: 

Forgive me if I am off-base here, but isn't this what the Office PIA's are for?

Rob Cooper
Yes, but that would involve creating an Excel.Application instance, loading the xls file, etc. If the requirement is purely to read some data from the file then it's much easier and far more lightweight to use one of the ADO.NET methods described in the other answers.
AdamRalph
+37  A: 

If it is just simple data contained in the Excel file you can read the data via ADO.NET. See the connection strings listed here:

http://www.connectionstrings.com/?carrier=excel2007 or http://www.connectionstrings.com/?carrier=excel

-Ryan

Update: then you can just read the worksheet via something like select * from [Sheet1$]

Ryan Farley
This way is by far the fastest.
StingyJack
Of course that's not true, Stingy. You have to sift through all the data and write crappy DB code (hand craft your models, map columns to properties, yadda yadda). The quickest way is to let *some other poor SOB do this for you*. That's why people use frameworks instead of writing everything from the bottom up.
Will
Besides that I have had times where it didn't give me the right results due to localization problems... the neverending fight of seperators
cyberzed
Worthless method! Truncates text columns to 255 characters when read. Beware! See: http://stackoverflow.com/questions/1519288/jet-engine-255-character-truncation ACE engine does same thing!
Triynko
Triynko, it has been a super long time since I used this method, but IIRC you can get around the 255 char limit by defining an ODBC DSN for the spreadsheet and then define the columns as longer in length and then use the DSN to connect to the spreadsheet. It's a pain to do that, but I believe that gets around that.
Ryan Farley
+1  A: 

Not free, but with the latest Office there's a very nice automation .Net API. (there has been an API for a long while but was nasty COM) You can do everything you want / need in code all while the Office app remains a hidden background process.

xanadont
A: 

Just did a quick demo project that required managing some excel files. The .NET component from GemBox software was adequate for my needs. It has a free version with a few limitations.

http://www.gemboxsoftware.com/GBSpreadsheet.htm

Christian Hagelid
+7  A: 

Here's some code I wrote in C# using .NET 1.1 a few years ago. Not sure if this would be exactly what you need (and may not be my best code :)).

using System;
using System.Data;
using System.Data.OleDb;

namespace ExportExcelToAccess
{
    /// <summary>
    /// Summary description for ExcelHelper.
    /// </summary>
    public sealed class ExcelHelper
    {
     private const string CONNECTION_STRING = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=<FILENAME>;Extended Properties=\"Excel 8.0;HDR=Yes;\";";

     public static DataTable GetDataTableFromExcelFile(string fullFileName, ref string sheetName)
     {
      OleDbConnection objConnection = new OleDbConnection();
      objConnection = new OleDbConnection(CONNECTION_STRING.Replace("<FILENAME>", fullFileName));
      DataSet dsImport = new DataSet();

      try
      {
       objConnection.Open();

       DataTable dtSchema = objConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);

       if( (null == dtSchema) || ( dtSchema.Rows.Count <= 0 ) )
       {
        //raise exception if needed
       }

       if( (null != sheetName) && (0 != sheetName.Length))
       {
        if( !CheckIfSheetNameExists(sheetName, dtSchema) )
        {
         //raise exception if needed
        }
       }
       else
       {
        //Reading the first sheet name from the Excel file.
        sheetName = dtSchema.Rows[0]["TABLE_NAME"].ToString();
       }

       new OleDbDataAdapter("SELECT * FROM [" + sheetName + "]", objConnection ).Fill(dsImport);
      }
      catch (Exception)
      {
       //raise exception if needed
      }
      finally
      {
       // Clean up.
       if(objConnection != null)
       {
        objConnection.Close();
        objConnection.Dispose();
       }
      }


      return dsImport.Tables[0];
      #region Commented code for importing data from CSV file.
      //    string strConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" +"Data Source=" + System.IO.Path.GetDirectoryName(fullFileName) +";" +"Extended Properties=\"Text;HDR=YES;FMT=Delimited\"";
      //
      //    System.Data.OleDb.OleDbConnection conText = new System.Data.OleDb.OleDbConnection(strConnectionString);
      //    new System.Data.OleDb.OleDbDataAdapter("SELECT * FROM " + System.IO.Path.GetFileName(fullFileName).Replace(".", "#"), conText).Fill(dsImport);
      //    return dsImport.Tables[0];

      #endregion
     }

     /// <summary>
     /// This method checks if the user entered sheetName exists in the Schema Table
     /// </summary>
     /// <param name="sheetName">Sheet name to be verified</param>
     /// <param name="dtSchema">schema table </param>
     private static bool CheckIfSheetNameExists(string sheetName, DataTable dtSchema)
     {
      foreach(DataRow dataRow in dtSchema.Rows)
      {
       if( sheetName == dataRow["TABLE_NAME"].ToString() )
       {
        return true;
       } 
      }
      return false;
     }
    }
}
hitec
This code needs some Resharper love
Cherian
Couldn't agree more Cherian. This code is many years old... before I even was proficient with Resharper :)
hitec
The code is ugly, but it shows how to get the sheet names, great!
Sam
+38  A: 
var fileName = string.Format("{0}\\fileNameHere", Directory.GetCurrentDirectory());
var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName);

var adapter = new OleDbDataAdapter("SELECT * FROM [workSheetNameHere$]", connectionString);
var ds = new DataSet();

adapter.Fill(ds, "anyNameHere");

DataTable data = ds.Tables["anyNameHere"];

This is what I usually use. It is a little different because I usually stick a AsEnumerable() at the edit of the tables:

var data = ds.Tables["anyNameHere"].AsEnumerable();

as this lets me use LINQ to search and build structs from the fields.

var query = data.Where(x => x.Field<string>("phoneNumber") != string.Empty).Select(x =>
       new MyContact
        {
         firstName= x.Field<string>("First Name"),
         lastName = x.Field<string>("Last Name"),
         phoneNumber =x.Field<string>("Phone Number"),
        });
Robin Robinson
If seems like the Select in this approach tries to guess the data type of the column and force upon that guessed data type. For example, if you have a column with mostly double values, it won't like you passing x.Field<string>, but expects x.Field<double>. IS this true?
Khnle
Just looked it up on MSDN. Looks like the <T> is just used to attempt to cast the contents in the column to a type. In this example and just casting the data in the columns to strings. If you wanted a double you would need to call double.Parse(x.Field<string>("Cost") or something like that.Field is an extension method for DataRow and it looks like there aren't an non generic versions.
Robin Robinson
+13  A: 

The ADO.NET approach is quick and easy, but it has a few quirks which you should be aware of, especially regarding how DataTypes are handled.

This excellent article will help you avoid some common pitfalls: http://blog.lab49.com/archives/196

Ian Nelson
@Ian Nelson - You answered my question (in the form of a comment above).
Khnle
+3  A: 

I did a lot of reading from Excel files in C# a while ago, and we used two approaches:

  • The COM API, where you access Excel's objects directly and manipulate them through methods and properties
  • The ODBC driver that allows to use Excel like a database.

The latter approach was much faster: reading a big table with 20 columns and 200 lines would take 30 seconds via COM, and half a second via ODBC. So I would recommend the database approach if all you need is the data.

Cheers,

Carl

Carl Seleborg
+5  A: 

While you did specifically ask for .xls, implying the older file formats, for the OpenXML formats (e.g. xlsx) I highly recommend the OpenXML SDK (http://msdn.microsoft.com/en-us/library/bb448854.aspx)

Hafthor
A++++ would let some other poor sombitch write it for me again.
Will
No thanks, the OpenXml API is awful!
Quoo
+1  A: 

I recommend the FileHelpers Library which is a free and easy to use .NET library to import/export data from EXCEL, fixed length or delimited records in files, strings or streams + More.

The Excel Data Link Documentation Section http://filehelpers.sourceforge.net/example_exceldatalink.html

Jason
I won't down you, but I recently started using FileHelpers and was shocked at how ... crappy it is. For instance, the only way to map columns in a csv to properties... excuse me, FIELDS, of a model is *to create the fields in the order of the columns*. I don't know about you, but I wouldn't rely on a quirk of the compiler for one of the most central design considerations of my f8king framework.
Will
+1  A: 

Lately, partly to get better at LINQ.... I've been using Excel's automation API to save the file as XML Spreadsheet and then get process that file using LINQ to XML.

kenny
XML Spreadsheet is a fairly clean format :)
dbkk
But like excel files.... can we protect xml files with password?
gsvirdi
I would suspect you can protect it from Excel, but not from man with compiler...like anything...it's just bytes.
kenny
+11  A: 

This is what I used for Excel 2003:

Dictionary<string, string> props = new Dictionary<string, string>();
props["Provider"] = "Microsoft.Jet.OLEDB.4.0";
props["Data Source"] = repFile;
props["Extended Properties"] = "Excel 8.0";

StringBuilder sb = new StringBuilder();
foreach (KeyValuePair<string, string> prop in props)
{
    sb.Append(prop.Key);
    sb.Append('=');
    sb.Append(prop.Value);
    sb.Append(';');
}
string properties = sb.ToString();

using (OleDbConnection conn = new OleDbConnection(properties))
{
    conn.Open();
    DataSet ds = new DataSet();
    string columns = String.Join(",", columnNames.ToArray());
    using (OleDbDataAdapter da = new OleDbDataAdapter(
        "SELECT " + columns + " FROM [" + worksheet + "$]", conn))
    {
        DataTable dt = new DataTable(tableName);
        da.Fill(dt);
        ds.Tables.Add(dt);
    }
}
Dmitry Shechtman
very clean code!
lb
+5  A: 

Koogra is an open-source component written in C# that reads and writes Excel files.

Rune Grimstad
I think this link needs updated... http://koogra.sourceforge.net/
Dave
Oops! Very wrong url! Fixed now
Rune Grimstad
+1  A: 

SpreadsheetGear for .NET is an Excel compatible spreadsheet component for .NET. You can see what our customers say about performance on the right hand side of our product page. You can try it yourself with the free, fully-functional evaluation.

Joe Erickson
+2  A: 

ExcelMapper is an open source tool (http://code.google.com/p/excelmapper/) that can be used to read Excel worksheets as Strongly Typed Objects. It supports both xls and xlsx formats.

A: 

I just used ExcelLibrary to load an .xls spreadsheet into a DataSet. Worked great for me.

Hafthor
+2  A: 

SmartXLS is another excel spreadsheet component which support most features of excel Charts,formulas engines, and can read/write the excel2007 openxml format.

liya
+1  A: 

Excel Package is an open-source (GPL) component for reading/writing Excel 2007 files. I used it on a small project, and the API is straightforward. Works with XLSX only (Excel 200&), not with XLS.

The source code also seems well-organized and easy to get around (if you need to expand functionality or fix minor issues as I did).

At first, I tried the ADO.Net (Excel connection string) approach, but it was fraught with nasty hacks -- for instance if second row contains a number, it will return ints for all fields in the column below and quietly drop any data that doesn't fit.

dbkk
A: 

You could try GemBox.Spreadsheet .NET Component for reading and writing Excel files in variety of formats like XLS, XLSX, ODS, CSV, HTML.

It is very fast and easy to use.

Free version is limited to maximum 150 rows per sheet and 5 sheets per workbook, but you can use it in your commercial applications.

+1  A: 

You can try using this open source solution that makes dealing with Excel a lot more cleaner.

http://excelwrapperdotnet.codeplex.com/

A: 

Try to use Spire.XLS (http://www.e-iceblue.com/Introduce/excel-for-net-introduce.html)

to read and write excel file so easy, I use from last year, quit good.

Peter
A: 

If you need fast make something then better to process xls by hand. I made short article on it with example: Excel sheet processing

Universal
A: 

Here is the simple perhaps useful solution http://chetanwarade.wordpress.com/2010/08/18/read-excel-with-ado-net/

cpp
+2  A: 

SpreadsheetGear is awesome. Yes it's an expense, but compared to twiddling with these other solutions, it's worth the cost. It is fast, reliable, very comprehensive, and I have to say after using this product in my fulltime software job for over a year and a half, their customer support is fantastic!

John R
+3  A: 

How about Excel Data Reader?

http://exceldatareader.codeplex.com/

I've used in it anger, in a production environment, to pull large amounts of data from a variety of Excel files into SQL Server Compact. It works very well and it's rather robust.

Michal
I'll second Excel Data Reader; it has also led to the incredibly useful Excel Data Driven Tests library, which uses NUnit 2.5's TestCaseSource attribute to make data-driven tests using Excel spreadsheets ridiculously easy. Just beware that Resharper doesn't yet support TestCaseSource, so you have to use the NUnit runner.
David Keaveny