views:

1045

answers:

5

problem

how to best parse/access/extract "excel file" data stored as binary data in an SQL 2005 field?

(so all the data can ultimately be stored in other fields of other tables.)

background

basically, our customer is requiring a large volume of verbose data from their users. unfortunately, our customer cannot require any kind of db export from their user. so our customer must supply some sort of UI for their user to enter the data. the UI our customer decided would be acceptable to all of their users was excel as it has a reasonably robust UI. so given all that, and our customer needs this data parsed and stored in their db automatically.

we've tried to convince our customer that the users will do this exactly once and then insist on db export! but the customer can not require db export of their users.

  • our customer is requiring us to parse an excel file
  • the customer's users are using excel as the "best" user interface to enter all the required data
  • the users are given blank excel templates that they must fill out
    • these templates have a fixed number of uniquely named tabs
    • these templates have a number of fixed areas (cells) that must be completed
    • these templates also have areas where the user will insert up to thousands of identically formatted rows
  • when complete, the excel file is submitted from the user by standard html file upload
  • our customer stores this file raw into their SQL database

given

  • a standard excel (".xls") file (native format, not comma or tab separated)
  • file is stored raw in a varbinary(max) SQL 2005 field
  • excel file data may not necessarily be "uniform" between rows -- i.e., we can't just assume one column is all the same data type (e.g., there may be row headers, column headers, empty cells, different "formats", ...)

requirements

  • code completely within SQL 2005 (stored procedures, SSIS?)
  • be able to access values on any worksheet (tab)
  • be able to access values in any cell (no formula data or dereferencing needed)
  • cell values must not be assumed to be "uniform" between rows -- i.e., we can't just assume one column is all the same data type (e.g., there may be row headers, column headers, empty cells, formulas, different "formats", ...)

preferences

  • no filesystem access (no writing temporary .xls files)
  • retrieve values in defined format (e.g., actual date value instead of a raw number like 39876)
+1  A: 

It sounds like you're trying to store an entire database table inside a spreadsheet and then inside a single table's field. Wouldn't it be simpler to store the data in a database table to begin with and then export it as an XLS when required?

Without opening up an instance Excel and having Excel resolve worksheet references I'm not sure it's doable at all.

dreamlax
in this case, excel is our user interface. we have a pre-defined template that they fill in -- to which they may add thousands of "uniform" rows. and the customer (who won't/can't save it as .csv) uploads the .xls using standard html file upload.
just mike
+3  A: 

My thought is that anything can be done, but there is a price to pay. In this particular case, the price seems to bee too high.

I don't have a tested solution for you, but I can share how I would give my first try on a problem like that.

My first approach would be to install excel on the SqlServer machine and code some assemblies to consume the file on your rows using excel API and then load them on Sql server as assembly procedures.

As I said, This is just a idea, I don't have details, but I'm sure others here can complement or criticize my idea.

But my real advice is to rethink the whole project. It makes no sense to read tabular data on binary files stored on a cell of a row of a table on database.

David Lay
+2  A: 

This looks like an "I wouldn't start from here" kind of a question.

The "install Excel on the server and start coding" answer looks like the only route, but it simply has to be worth exploring alternatives first: it's going to be painful, expensive and time-consuming.

I strongly feel that we're looking at a "requirement" that is the answer to the wrong problem.

What business problem is creating this need? What's driving that? Try the Five Whys as a possible way to explore the history.

Mike Woodhouse
+1  A: 

Could you write the varbinary to a Raw File Destination? And then use an Excel Source as your input to whatever step is next in your precedence constraints.

I haven't tried it, but that's what I would try.

Neil D
+1  A: 

Well, the whole setup seems a bit twisted :-) as others have already pointed out.

If you really cannot change the requirements and the whole setup: why don't you explore components such as Aspose.Cells or Syncfusion XlsIO, native .NET components, that allow you to read and interpret native Excel (XLS) files. I'm pretty such with either of the two, you should be able to read your binary Excel into a MemoryStream and then feed that into one of those Excel-reading components, and off you go.

So with a bit of .NET development and SQL CLR, I guess this should be doable - not sure if it's the best way to do it, but it should work.

Marc

marc_s