views:

3590

answers:

6

I have a request for some contract work from an organization that uses Excel as a database and wants to do some work on the Excel data via a real database. (Yeah, I know, never mind...)

The client has an Excel sheet that they use internally to keep track of some government programmes. The data from this Excel sheet used to be manually imported into a SQL database via CSV as intermediate format and made available via a tiny web app. Changes in either the spreadsheet or the db were done manually (by different people) and had to be kept in sync manually.

The spec for new functionality includes:

  • upload the Excel file into the web app
  • make minor changes via the web app (this bit is, of course, a no-brainer)
  • occasionally export the data back into Excel

The spreadsheet (actually, it's a couple of them in a workbook) implements some guidelines necessary to interact with other institutions and therefore has to remain the same structurally before and after import. It contains a lot of formatting, hidden columns and sort buttons as well as a lot of data links between the cells in the different sheets.

I don't want to have to reproduce the spreadsheet from scratch to deliver the export, nor do I want to manually extract the proper columns into CSV before making the import. I'm rather looking for a way to load the Excel, "query" certain fields, write them to the DB and later load the data back from the DB and manipulate only the contents of the proper cells.

Is there a way to programatically interface with an existing spreadsheet and only read or change the bits that you need?

+1  A: 

You may be interested in Excel 2007 Collaboration features (like editing an xls from the web).

friol
+4  A: 

Excel is a 'COM Capable Application' and as such you can use COM to access and manipulate the data in an Excel document. You don't say what platform you are using - but if it's .NET then it's really very easy. See http://support.microsoft.com/kb/302084 for how to get started with C#.

If you're not using .net then any language that can interact with a COM component will work.

Jackson
+2  A: 

We're reading and manipulating Excel-Data via Apache POI, which is not complete in decoding Excel files (namely formula cells are not completely supported) but our customers are quite happy with us.

POI is a Java Library, so if you are a pure Windows shop there may be other more natural options, but as I said, our experience with POI is very good, people are happy.

Additionally: I believe to have heard of Excel ODBC drivers - maybe this is what you want/need? (Sorry, I've never worked with them)

Olaf
Yes, Excel ships with ODBC drivers. There may be file locking issues though (Excel likes to have an exclusive lock on spreadsheets, so the ODBC driver may require it too).
Mr Fooz
+1  A: 

The same API that's used by VBA is available through an external COM interface. There are quite a few books on the subject. I recommend the O'Reilly one by Steven Roman but your tastes may vary.

ConcernedOfTunbridgeWells
+2  A: 

You don't specify a language, so if you are language agnostic .Net gives you some very powerful classes for data handling:

to open a csv file:

Imports System.Data.OleDb, Imports Excel = Microsoft.Office.Interop.Excel

    Dim ConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + DataFolder + "\;Extended Properties='text;HDR=Yes'"

    Dim conn As New System.Data.OleDb.OleDbConnection(ConnectionString)
    conn.Open()

    Dim CommandText As String = CommandText = "select * from [" + CSVFileName + "]"
    If Filter.Length > 0 Then
        CommandText += " WHERE " + Filter
    End If

    Dim daAsset As New OleDbDataAdapter(CommandText, conn)
    Dim dsAsset As New DataSet
    daAsset.Fill(dsAsset, "Asset")

opening a sheet in a workbook is very similar - you specify the sheet name and can then fill a DataSet with the entire sheet - you can then access the Tables().Rows() of the DataSet to get each row and field, iterate over every row etc.

GalleySlave
+1  A: 

Another approach would be to write an excel function that talks to the database directly and returns the result as an array.

If you think this approach would work well you could try XLLoop - this allows you to easily write excel functions in Java (as well as scripting languages that support Apache's BSF).

Peter Smith