problem
how to best parse/access/extract "excel file" data stored as binary data in an SQL 2005 field?
(so all the data can ultimately be stored in other fields of other tables.)
background
basically, our customer is requiring a large volume of verbose data from their users. unfortunately, our customer cannot require any kind of db export from their user. so our customer must supply some sort of UI for their user to enter the data. the UI our customer decided would be acceptable to all of their users was excel as it has a reasonably robust UI. so given all that, and our customer needs this data parsed and stored in their db automatically.
we've tried to convince our customer that the users will do this exactly once and then insist on db export! but the customer can not require db export of their users.
- our customer is requiring us to parse an excel file
- the customer's users are using excel as the "best" user interface to enter all the required data
- the users are given blank excel templates that they must fill out
- these templates have a fixed number of uniquely named tabs
- these templates have a number of fixed areas (cells) that must be completed
- these templates also have areas where the user will insert up to thousands of identically formatted rows
- when complete, the excel file is submitted from the user by standard html file upload
- our customer stores this file raw into their SQL database
given
- a standard excel (".xls") file (native format, not comma or tab separated)
- file is stored raw in a
varbinary(max)
SQL 2005 field - excel file data may not necessarily be "uniform" between rows -- i.e., we can't just assume one column is all the same data type (e.g., there may be row headers, column headers, empty cells, different "formats", ...)
requirements
- code completely within SQL 2005 (stored procedures, SSIS?)
- be able to access values on any worksheet (tab)
- be able to access values in any cell (no formula data or dereferencing needed)
- cell values must not be assumed to be "uniform" between rows -- i.e., we can't just assume one column is all the same data type (e.g., there may be row headers, column headers, empty cells, formulas, different "formats", ...)
preferences
- no filesystem access (no writing temporary .xls files)
- retrieve values in defined format (e.g., actual date value instead of a raw number like 39876)