tags:

views:

782

answers:

4

I need to be able to import the contents of an Excel spreadsheet using the file upload capability. The user must be able to upload the file to the web server which will then open it and import its contents. The steps are:

  1. Upload the file
  2. Open it and read column headers from the top row
  3. Map the columns into the columns I can accept
  4. Execute the import.

I can do this with a CSV, but CSV files are so easy to corrupt that I want to be able to do it with an Excel file. I cannot open the file directly except as a byte stream.

Any help out there?

Thanks...

+1  A: 

One way I have found to read from Excel files is using ODBC. I did it once for a similar project. Basically, you can treat the excel file as a single-table "database". From there, you can easily query rows/columns as needed.

The following Code Project sums things up nicely:

http://www.codeproject.com/KB/database/excel_odbc.aspx

Eric Pi
+1  A: 

I've done this before with ASP.NET for probably the same purposes. It was basically to allow bulk import of records into the database. The basic idea is:

  1. Have the user upload the file and the server saves it somewhere on the filesystem

  2. Use ADO.NET to connect to the Excel file like any other database connection

  3. Use standard ADO.NET datareaders and datatables to get at the data and load it into your import process

One neat thing about using ADO.NET is that you can actually modify the Excel file and on my project, I did just this by recording in the file a status or error message for each row. I then had an interface where the user could download the updated Excel file and would know which records had problems importing so they could fix those records and try submitting it again.

EDIT: If the requirement is that you cannot write the file to disk and reference it in the ADO.NET connection string, then you're likely looking at a 3rd-party library to be able to work with the Excel file in memory. See this other SO question.

David Archer
A: 

You could use Apache's POI.

It's a java library for reading and writing MS Office formats. Since it's java and you're using C# you'll need IKVM and the java classes from the POI Project.

However, the easiest way is to just download Jon Iles excelent MPXJ project and you've got it all. Just set a reference to IKVM.OpendJDK.ClassLibrary.dll, IKVM.Runtime.dll, poi-3.2-FINAL-20081019.dll

I've hacked together a quick console app to show you an simple way to read an Excel sheet. It only reads the first sheet and doesn't use the row or cell iterators, but it does the job well.

With a very small amount of effort i'm sure you could figure out how to use an input stream rather than a file.

//C# code for using the Apache POI libraries
using System;
using System.Collections.Generic;
using System.Text;


// poi for xls
using org.apache.poi;
using org.apache.poi.poifs;
using org.apache.poi.poifs.filesystem;
using org.apache.poi.hssf;
using org.apache.poi.hssf.usermodel;
using org.apache.poi.ss;

namespace ConsoleApplication1
{
    class Test
    {
        static void Main(string[] args)
        {

            if (args.Length != 1)
            {
                Console.Out.WriteLine("Usage: XLSReadTest <xls file>");
            }
            else
            {
                XLSRead x = new XLSRead();
                x.Process(args[0]);
                //x.Process("c:\\temp\\testfile.xls");
            }




        }
    }


    class XLSRead
    {
        public void Process(string inputFile)
        {


            int r = 0;


            Console.Out.WriteLine("Reading input file started.");
            DateTime start = DateTime.Now;

            java.io.InputStream inputStream = new java.io.FileInputStream(inputFile);
            POIFSFileSystem fs = new POIFSFileSystem(inputStream);

            HSSFWorkbook wb = new HSSFWorkbook(fs);
            HSSFSheet sh = wb.getSheetAt(0);


            r = sh.getFirstRowNum();
            while (r <= sh.getLastRowNum())
            {
                HSSFRow row = sh.getRow(r);
                int c = row.getFirstCellNum();
                string val = "";

                while (c < row.getLastCellNum())
                {
                    HSSFCell cell = row.getCell(c);
                    switch(cell.getCellType())
                    {
                      case HSSFCell.CELL_TYPE_NUMERIC:
                          val = cell.getNumericCellValue().ToString();
                          break;
                      case HSSFCell.CELL_TYPE_STRING:
                          val = cell.getStringCellValue();
                          break;
                    }
                    Console.Out.WriteLine("Row: " + r + ", Cell: " + c + " = " + val);
                    c++;
                }
                r++;
            }

            long elapsed = DateTime.Now.Ticks - start.Ticks;
            String seconds = String.Format("{0:n}", elapsed / 1000000);
            Console.Out.WriteLine("\r\n\r\nReading input file completed in " + seconds + "s." + "\r\n");



        }
    }
}
Mark Nold
Can't do Java - must be C#
Bob Jones
funnily enough it is C# :)
Mark Nold
No sense using POI with IKVM when there's a solid .NET port of POI: NPOI http://npoi.codeplex.com/
Nate
A: 

SpreadsheetGear for .NET can open Excel workbooks from a file (Factory.GetWorkbook(filename)), from a stream (Factory.GetWorkbookSet().Workbooks.OpenFromStream(stream)) or from a byte array (Factory.GetWorkbookSet().Workbooks.OpenFromMemory(byteArray)), and has Excel compatible APIs such as workbook.Worksheets[index].Cells[row, col].Value which return the raw value of a cell or workbook.Worksheets[index].Cells[row, col].Text which will return the formatted value of a cell as a string.

You can see live ASP.NET samples here and download a free trial here.

Disclaimer: I own SpreadsheetGear LLC

Joe Erickson
Looks good, but at $999 per developer license and with the requirement that every developer who touches ANY of the code in the product have a license, they have priced it way beyond my wallet.
Bob Jones
I understand. As anyone in the software biz knows, pricing software is not an easy thing. We have customers who buy a few licenses and have millions of users of their applications - they obviously get a very good deal. We have a few customers who buy one license and they are the only user - they obviously pay quite a bit per user but found enough value (generally because of the performance advantage over Excel) to do it anyway. In the end it comes down to how much time a tool saves you and how much it improves your application(s). In any case, good luck with your application :)
Joe Erickson