views:

141

answers:

4

Hi guys,

First, I want to say that I'm out on deep water here, since I'm just doing some changes to code that is written by someone else in the company, using OleDbDataAdapter to "talk" to Excel and I'm not familiar with that. There is one bug there I just can't follow.

I'm trying to use a OleDbDataAdapter to read in a excel file with around 450 lines.

In the code it's done like this:

connection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source='" + path + "';" + "Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1;\"");
connection.Open();
OleDbDataAdapter objAdapter = new OleDbDataAdapter(objCommand.CommandText, connection);
objAdapter.Fill(objDataSet, "Excel");

foreach (DataColumn dataColumn in objTable.Columns) {
  if (dataColumn.Ordinal > objDataSet.Tables[0].Columns.Count - 1) {
    objDataSet.Tables[0].Columns.Add();
  }
  objDataSet.Tables[0].Columns[dataColumn.Ordinal].ColumnName = dataColumn.ColumnName;
  objImport.Columns.Add(dataColumn.ColumnName);
}

foreach (DataRow dataRow in objDataSet.Tables[0].Rows) {
   ...
}

Everything seems to be working fine except for one thing. The second column is filled with mostly four digit numbers like 6739, 3920 and so one, but fice rows have alphanumeric values like 8201NO and 8205NO. Those five cells are reported as having blank contents instead of their alphanumeric content. I have checked in excel, and all the cells in this columns are marked as Text.

This is an xls file by the way, and not xlsx.

Do anyone have any clue as why these cells are shown as blank in the DataRow, but the numeric ones are shown fine? There are other columns with alphanumeric content that are shown just fine.

+3  A: 

What's happening is that excel is trying to assign a data type to the spreadsheet column based on the first several values in that column. I suspect that if you look at the properties in that column it will say it is a numerical column.

The problem comes when you start trying to query that spreadsheet using jet. When it thinks it's dealing with a numerical column and it finds a varchar value it quietly returns nothing. Not even a cryptic error message to go off of.

As a possible work around can you move one of the alpha numeric values to the first row of data and then try parsing. I suspect you will start getting values for the alpha numeric rows then...

Take a look at this article. It goes into more detail on this issue. it also talks about a possible work around which is:

However, as per JET documentation, we can override the registry setting thru the Connection String, if we set IMEX=1( as part of Extended Properties), the JET will set the all column type as UNICODE VARCHAR or ADVARWCHAR irrespective of ‘ImportMixedTypes’ key value.hey

Abe Miessler
I have tested this now, and indeed if I put the first row alphanumeric, then it works as expected. My problem is that I can't make this a general rule, as the customers will read in their own sheets. However, my solution was to "cheat", so that I change HDR= No in the connection string to make sure that the alphanumeric header is read to make the column alphanumeric, then I cut the first line of the resulting DataTable. It's quite nasty, but I don't see any other options here. Thanks a lot for your help getting me in the right direction.
Øyvind Bråthen
Very sneaky. If it works it works!
Abe Miessler
+1  A: 

IMEX=1 means "Read mixed data as text."

There are some gotchas, however. Jet will only use several rows to determine whether the data is mixed, and if so happens these rows are all numeric, you'll get this behaviour.

See connectionstrings.com for details:

Check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows". That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance. Please also note that adding the IMEX=1 option might cause the IMEX feature to set in after just 8 rows. Use IMEX=0 instead to be sure to force the registry TypeGuessRows=0 (scan all rows) to work.

GSerg
I don't know how Jet is exposed in Excel, but in Access, you can change things like that at runtime in your current instance of the Jet db engine without having to alter the registry and restart Access.
David-W-Fenton
Thanks for this information. It worked but was suprisingly slow, so I had to go with the "read and dispose header" trick instead.
Øyvind Bråthen
+1  A: 

I would advise against using the OleDb data provider stuff to access Excel if you can help it. I've had nothing but problems, for exactly the reasons that others have pointed out. The performance tends to be atrocious as well when you are dealing with large spreadsheets.

You might try this open source solution: http://exceldatareader.codeplex.com/

Mark
I totally agree with you Mark. I think it's quite horrible, but in this case I don't have any options as I'm assigned to fix this bug in an existing program, and have not been alloted the time to do any large refactorings. I'll keep your link in mind if I need to do this from scratch later on.
Øyvind Bråthen
+1  A: 

Not it does not, it means enable import mode, which does not work again.

My opinion forget about loading data from excel, save data as tab file than load it. much less hustle.

A lot of people having same problem with loading data from Excel.

Regardless what you use all Ole DB, ODBC or Jet the all work the same ways

For my point of view the best solution is to write macros and save excel file as flat file than it works

There are several Microsoft KB's of how it all is works (or does not work)

Here is condensed information

ODBC/MS Jet scans first TypeGuessRows to determine field type

Here how Excel ODBC/MS Jet works

(TypeGuessRows=8 IMEX=1)

In your eight (8) scanned rows, if the column contains five (5) numeric values and three (3) text values, the provider returns five (5) numbers and three (3) null values.

In your eight (8) scanned rows, if the column contains three (3) numeric values and five (5) text values, the provider returns three (3) null values and five (5) text values.

In your eight (8) scanned rows, if the column contains four (4) numeric values and four (4) text values, the provider returns four (4) numbers and four (4) null values.

In your eight (8) scanned rows all of them less than 255 characters the provider will truncate all data to 255 characters

In your eight (8) scanned rows, if the column contains five (5) values with more length than 255 the provider will return more than 255 characters

NOTE:

Setting IMEX=1 tells the driver to use Import mode. In this state, the registry setting ImportMixedTypes=Text will be noticed. This forces mixed data to be converted to text. For this to work reliably, you may also have to modify the registry setting, TypeGuessRows=8. The ISAM driver by default looks at the first eight rows and from that sampling determines the datatype. If this eight row sampling is all numeric, then setting IMEX=1 will not convert the default data type to Text; it will remain numeric.

More information and how to solve it

http://www.etl-tools.com/imex1.html

ETL Man