views:

718

answers:

1

I'm trying to read an Excel file (.xls, I think Excel 2003 compatible) via OleDB, but it fails to get the values for referenced fields.

This is my current test code (please note, this is just part of the class):

Private m_conn As OleDbConnection
Public Sub New(ByVal fileName As String)
    Dim connString As String = String.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=Excel 8.0;", fileName)
    m_conn = New OleDbConnection(connString)
    m_conn.Open()
End Sub
Public Sub GetSheet(ByVal sheet As String)
    Dim query As String = String.Format("SELECT * FROM [{0}]", sheet)
    Using cmd As OleDbCommand = m_conn.CreateCommand()
        cmd.CommandText = query
        Dim ds As New DataSet()
        Dim a As New OleDbDataAdapter(cmd)
        Using rdr As OleDbDataReader = cmd.ExecuteReader()
            While rdr.Read()
                Debug.WriteLine(rdr.Item(0).ToString())
            End While
        End Using
    End Using
End Sub

But if the value is a reference (something like =+'MySheetName'!K37), I just get a DBNull from the call to rdr.Item(0).

I can get around this by automating Excel instead, but would prefer not to have to use Excel automation so wondering if anyone knows how to do it.

+2  A: 

Not the whole class involved but you will get the idea. This isn't excel automation, rather Microsoft Access oledb drivers reading the excel file, so its much faster.

You have to have registry setting correct and determine which version of Excel your reading and you have to install some drivers. Note: Office 2010 drivers can be downloaded from Microsoft but I'm not sure if they also install the drivers required for older editions of MS Office files, aka I have only installed it on my SQL server where I had older oledb drivers for MS Office loaded.

        '''<summary>
    ''' Retrieved the data in spreadsheet to a ADO.net dataset. This should make things mega easy to work with. 
    ''' See Microsoft KB316934 for details on how this was done.
    ''' Optional First row = field titles method variable, which extends the options available to the calling application.
    '''</summary>
    '''<example>
    ''' When passing in the variable "SheetorNamedRangeorSheetNumber" please use the following examples.
    '''•    Use the sheet name followed by a dollar sign (for example, [Sheet1$] or [My Worksheet$]). A workbook table that is referenced in this manner includes the whole used range of the worksheet.
    '''Select * from [Sheet1$]
    '''•    Use a range with a defined name (for example, [MyNamedRange]):
    '''Select * from [MyNamedRange]
    '''•    Use a range with a specific address (for example, [Sheet1$A1:B10]):
    '''Select * from [Sheet1$A1:B10]
    '''</example>
    ''' <param name="filePath">The full path to the excel file. If you pass in a relative path, it will be automatically converted to a full path. The file has to exist for this method to return a dataset.  If the file doesn't exist a FileNotFoundException will be raised.</param>
    ''' <param name="firstRowTitles">Boolean to indicate if the first row of the spreadsheet contains column titles.</param>
    ''' <param name="readAsTextData">Boolean to indicate if the spreadsheet should be read as as text data.  This is particularly important for columns that may contain mixed numberic and alpha numeric data.  </param>
    ''' <remarks>
    ''' Its not possible to open and excel workbook via this method if there is password protection on the workbook.  It could be removed
    ''' by automation and then loaded or manaully removed before running this code.
    '''
    '''   LL: The IMEX=1 setting then references the values set in the Windows registry
    '''
    ''' Check the following registry settings for the *machine*:
    ''' Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/TypeGuessRows
    ''' Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes
    ''' Hkey_Local_Machine/SOFTWARE/Wow6432Node/Microsoft/Jet/4.0/Engines/Excel
    ''' TypeGuessRows: setting the value to 0 (zero) will force ADO to scan
    ''' all column values before choosing the appropriate data type.
    ''' ImportMixedTypes: should be set to value 'Text' i.e. import mixed-type
    ''' columns as text:
    ''' Using IMEX=1 in the connection string (as you have done) ensures the
    ''' registry setting is applied.
    ''' Office 2007 data drivers can be downloaded free at <see>http://www.microsoft.com/downloads/details.aspx?familyid=7554F536-8C28-4598-9B72-EF94E038C891&amp;amp;displaylang=en&lt;/see&gt;
    ''' 
    ''' </remarks>
    Overloads Function GetDataSet(ByVal filePath As String, ByVal sheetOrNamedRangeOrSheetNumber As String, ByVal firstRowTitles As Boolean, ByVal readAsTextData As Boolean) As System.Data.DataSet

        Dim strConnectionString As String = String.Empty
        Dim connection As New System.Data.OleDb.OleDbConnection
        Dim cmd As New System.Data.OleDb.OleDbCommand
        Dim adp As New System.Data.OleDb.OleDbDataAdapter
        Dim dset As New System.Data.DataSet
        Dim strSheetName As String
        Dim ExcelFileInfo As System.IO.FileInfo
        '   Registry manipulation features.
        Dim currentJetRegistryValue As String = "19"
        '   Excel file extension handling
        Dim ExcelFileType As Excel.ExcelVersion = ExcelVersion.Unknown

        ExcelFileInfo = New System.IO.FileInfo(filePath)
        If ExcelFileInfo.Exists Then
            Select Case ExcelFileInfo.Extension.ToLower
                Case ".xls"
                    ExcelFileType = ExcelVersion.Excel97_2003
                Case ".xlsx"
                    ExcelFileType = ExcelVersion.Excel2007
                Case Else
                    ExcelFileType = ExcelVersion.Unknown
                    Throw New ExcelException(-2, String.Format("Excel file extension '{0}' is not valid or code changes are required to handle that file extension.", ExcelFileInfo.Extension), "")
            End Select



            Try
                Select Case ExcelFileType
                    Case ExcelVersion.Excel97_2003

                        '   Need to get the current registry setting, because we can change the value temporarily
                        currentJetRegistryValue = Me.GetJetExcelRegistry
                        If currentJetRegistryValue <> "0" Then
                            Me.UpdateJetExcelRegistry("0")
                        End If

                        If readAsTextData Then
                            If firstRowTitles Then
                                strConnectionString = System.String.Format(My.Resources.ExcelConnectionWithHeaderAllText, ExcelFileInfo.FullName)
                            Else
                                strConnectionString = System.String.Format(My.Resources.ExcelConnectionWithoutHeaderAllText, ExcelFileInfo.FullName)
                            End If
                        Else
                            If firstRowTitles Then
                                strConnectionString = System.String.Format(My.Resources.ExcelConnectionWithHeader, ExcelFileInfo.FullName)
                            Else
                                strConnectionString = System.String.Format(My.Resources.ExcelConnectionWithoutHeader, ExcelFileInfo.FullName)
                            End If
                        End If
                    Case ExcelVersion.Excel2007

                        '   Need to get the current registry setting, because we can change the value temporarily
                        currentJetRegistryValue = Me.GetExcel2007Registry
                        If currentJetRegistryValue <> "0" Then
                            Me.UpdateExcel2007Registry("0")
                        End If

                        If firstRowTitles Then
                            strConnectionString = System.String.Format(My.Resources.Excel2007ConnectionWithHeaderAllText, ExcelFileInfo.FullName)
                        Else
                            strConnectionString = System.String.Format(My.Resources.Excel2007ConnectionWithoutHeaderAllText, ExcelFileInfo.FullName)
                        End If

                End Select

                If strConnectionString = String.Empty Then
                    Throw New ExcelException(-3, "Excel Connection string hasn't been set.", "")
                End If

                connection = New System.Data.OleDb.OleDbConnection
                With connection
                    .ConnectionString = strConnectionString
                    .Open()
                End With

                '   Create a command connection
                cmd = New System.Data.OleDb.OleDbCommand
                With cmd
                    .Connection = connection
                    .CommandType = CommandType.Text
                    If IsNumeric(sheetOrNamedRangeOrSheetNumber) = True Then
                        '   Ok extract the sheet name from the connection first
                        Dim objDT As New System.Data.DataTable
                        objDT = connection.GetOleDbSchemaTable(System.Data.OleDb.OleDbSchemaGuid.Tables, New Object() {Nothing, Nothing, Nothing, "TABLE"})
                        strSheetName = objDT.Rows(CInt(sheetOrNamedRangeOrSheetNumber)).Item("TABLE_NAME").ToString
                        objDT = Nothing
                        .CommandText = System.String.Format(System.Globalization.CultureInfo.InvariantCulture, My.Resources.ExcelSelectStatement, strSheetName)
                    Else
                        '   Must be a name, range or name Cell reference
                        .CommandText = System.String.Format(System.Globalization.CultureInfo.InvariantCulture, My.Resources.ExcelSelectStatement, sheetOrNamedRangeOrSheetNumber)
                    End If

                    '.ExecuteReader()        'Forward only Dataset
                End With

                '   Create a data adapter to store the inforamtion
                adp = New System.Data.OleDb.OleDbDataAdapter
                dset = New DataSet
                With adp
                    .SelectCommand = cmd
                    .Fill(dset, My.Resources.DataSetTableName)
                End With

                Select Case ExcelFileType
                    Case ExcelVersion.Excel97_2003
                        '   Restore the Excel readrows value, as we have finished with our modification.
                        If currentJetRegistryValue <> Me.GetJetExcelRegistry Then
                            '   Restore the registry setting (Securtiy implications)
                            Me.UpdateJetExcelRegistry(currentJetRegistryValue)
                        End If
                    Case ExcelVersion.Excel2007
                        '   Restore the Excel readrows value, as we have finished with our modification.
                        If currentJetRegistryValue <> Me.GetExcel2007Registry Then
                            '   Restore the registry setting (Securtiy implications)
                            Me.UpdateExcel2007Registry(currentJetRegistryValue)
                        End If
                End Select

                '   Return the resulting dataset to the calling application
                GetDataSet = dset
            Catch comex As System.Runtime.InteropServices.COMException
                Throw New ExcelException(comex.ErrorCode, comex.Message, comex.InnerException)
            Catch ex As System.InvalidOperationException
                Throw New ExcelException(-1, ex.Message, ex.InnerException)
            Finally

                If Not cmd Is Nothing Then cmd = Nothing
                If Not connection Is Nothing Then
                    connection.Close()
                    connection.Dispose()
                    connection = Nothing
                End If
                If Not adp Is Nothing Then adp = Nothing
                If Not dset Is Nothing Then dset = Nothing
            End Try
        Else
            Throw New System.IO.FileNotFoundException(System.String.Format(System.Globalization.CultureInfo.InvariantCulture, My.Resources.FileNotFound, filePath))
            Return Nothing
        End If
    End Function

Project Resources:

Excel2007ConnectionWithHeader Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties="Excel 12.0;HDR=Yes"; Excel2007ConnectionWithHeaderAllText Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties="Excel 12.0;HDR=Yes;IMEX=1";
Excel2007ConnectionWithoutHeader Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties="Excel 12.0;HDR=No";
Excel2007ConnectionWithoutHeaderAllText Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties="Excel 12.0;HDR=No;IMEX=1";
ExcelAutomationError Unknow Excel Automation error '{0}': {1}. {2}
ExcelConnectionWithHeader Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties="Excel 8.0;HDR=Yes";
ExcelConnectionWithHeaderAllText Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties="Excel 8.0;HDR=Yes;IMEX=1";
ExcelConnectionWithoutHeader Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties="Excel 8.0;HDR=No";
ExcelConnectionWithoutHeaderAllText Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties="Excel 8.0;HDR=No;IMEX=1"; ExcelException "Error reading excel data file '{0}'. Message: {1}"
ExcelOledbException "Oledb error reading excel data file '{0}'. Message: {1}"
ExcelPasswordException Could not decrypt secure Excel file.
ExcelPasswordRestriction The excel contains a password that restricts the workbook from being read. Please remove the workbook password from '{0}'. And then retry.
ExcelSelectStatement SELECT * FROM [{0}]

Jamie Clayton
I already had code similar to this, but I didn't know about the registry settings so the comments at the top of your sample were very helpful, `TypeGuessRows` seems to have solved my problem, thanks.
ho1
Glad to help. It took me some time to find all the information and test this style of code.
Jamie Clayton
Wow, SO's VB formatting apparently sucks :) However this is pretty detailed answer, +1 for the Wow6432Node which I did not know about.
Piotr Owsiak