ansaurus

Question

Scientific notation when importing from Excel in .Net

Answer 1

A:

I have found that the easiest way is to choose Zip format, rather than text format for columns with large 'numbers'.

Remou 2009-01-09 21:42:15

Like I said above, I have no control over the file. The client creates that.

ChrisDiRulli 2009-01-09 21:43:51

Answer 2

A:

Have you tried casting the value of the field to (int) or perhaps (Int64) as you are reading it?

palehorse 2009-01-09 21:47:40

I don't "read" the file, the OleDb API does that. I simply invoke the "fill" method on the OleDbDataAdapter and pass in a DataSet.The Dataset is then filled up with juicy delicious data.

ChrisDiRulli 2009-01-09 21:54:12

Is the dataset strongly typed so that field expects a number?

palehorse 2009-01-09 21:55:37

No, it is not strongly typed.

ChrisDiRulli 2009-01-09 22:04:26

Try creating a strong typed dataset for it. That should convert it correctly.

palehorse 2009-01-09 22:31:02

Answer 3

+3 A:

The OleDb library will, more often than not, mess up your data in an Excel spreadsheet. This is largely because it forces everything into a fixed-type column layout, guessing at the type of each column from the values in the first 8 cells in each column. If it guesses wrong, you end up with digit strings converted to scientific-notation. Blech!

To avoid this you're better off skipping the OleDb and reading the sheet directly yourself. You can do this using the COM interface of Excel (also blech!), or a third-party .NET Excel-compatible reader. SpreadsheetGear is one such library that works reasonably well, and has an interface that's very similar to Excel's COM interface.

P Daddy 2009-01-09 22:13:22

Answer 4

+1 A:

Look up the IMEX=1 connection string option and TypeGuessRows registry setting on google. In truth, there is no easy way round this because the reader infers column data types by looking at the first few rows (8 by default). If the rows contain all numbers then you're out of luck.

An unfortunate workaround which I've used in the past is to use the HDR=NO connection string option and set the TypeGuessRows registry setting value to 1, which forces it to read the first row as valid data to make its datatype determination, rather than a header. It's a hack, but it works. The code reads the first row (containing the header) as text, and then sets the datatype accordingly.

Changing the registry is a pain (and not always possible) but I'd recommend restoring the original value afterwards.

If your import data doesn't have a header row, then an alternative option is to pre-process the file and insert a ' character before each of the numbers in the offending column. This causes the column data to be treated as text.

So all in all, there are a bunch of hacks to work around this, but nothing really foolproof.

Andrew Rollings 2009-02-18 04:23:22

Answer 5

A:

I had this same problem, but was able to work around it without resorting to the Excel COM interface or 3rd party software. It involves a little processing overhead, but appears to be working for me.

First read in the data to get the column names
Then create a new DataSet with each of these columns, setting each of their DataTypes to string.
Read the data in again into this new dataset. Voila - the scientific notation is now gone and everything is read in as a string.

Here's some code that illustrates this, and as an added bonus, it's even StyleCopped!

public void ImportSpreadsheet(string path)
{
    string extendedProperties = "Excel 12.0;HDR=YES;IMEX=1";
    string connectionString = string.Format(
        CultureInfo.CurrentCulture,
        "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"{1}\"",
        path,
        extendedProperties);

    using (OleDbConnection connection = new OleDbConnection(connectionString))
    {
        using (OleDbCommand command = connection.CreateCommand())
        {
            command.CommandText = "SELECT * FROM [Worksheet1$]";
            connection.Open();

            using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
            using (DataSet columnDataSet = new DataSet())
            using (DataSet dataSet = new DataSet())
            {
                columnDataSet.Locale = CultureInfo.CurrentCulture;
                adapter.Fill(columnDataSet);

                if (columnDataSet.Tables.Count == 1)
                {
                    var worksheet = columnDataSet.Tables[0];

                    // Now that we have a valid worksheet read in, with column names, we can create a
                    // new DataSet with a table that has preset columns that are all of type string.
                    // This fixes a problem where the OLEDB provider is trying to guess the data types
                    // of the cells and strange data appears, such as scientific notation on some cells.
                    dataSet.Tables.Add("WorksheetData");
                    DataTable tempTable = dataSet.Tables[0];

                    foreach (DataColumn column in worksheet.Columns)
                    {
                        tempTable.Columns.Add(column.ColumnName, typeof(string));
                    }

                    adapter.Fill(dataSet, "WorksheetData");

                    if (dataSet.Tables.Count == 1)
                    {
                        worksheet = dataSet.Tables[0];

                        foreach (var row in worksheet.Rows)
                        {
                            // TODO: Consume some data.
                        }
                    }
                }
            }
        }
    }
}

Andrew Garrison 2010-09-09 16:06:58

ansaurus

tags:

views:

answers:

Scientific notation when importing from Excel in .Net

related questions