views:

1944

answers:

3

Basically I need to insert a bunch of data to an Excel file. Creating an OleDB connection appears to be the fastest way but I've seen to have run into memory issues. The memory used by the process seems to keep growing as I execute INSERT queries. I've narrowed them down to only happen when I output to the Excel file (the memory holds steady without the output to Excel). I close and reopen the connection in between each worksheet, but this doesn't seem to have an effect on the memory usage (as so does Dispose()). The data is written successfully as I can verify with relatively small data sets. If anyone has insight, it would be appreciated.

initializeADOConn() is called in the constructor

initADOConnInsertComm() creates the insert parameterized insert query

writeRecord() is called whenever a new record is written. New worksheets are created as needed.

public bool initializeADOConn()
        {
            /* Set up the connection string and connect.*/
            string connectionString = @"Provider=Microsoft.Jet.OLEDB.4.0;" +
                "Data Source=" + this.destination + ";Extended Properties=\"Excel 8.0;HDR=YES;\"";
            //DbProviderFactory factory =
                //DbProviderFactories.GetFactory("System.Data.OleDb");
            conn = new OleDbConnection(connectionString);
            conn.ConnectionString = connectionString;
            conn.Open();

            /* Intialize the insert command. */
            initADOConnInsertComm();
            return true;
        }
    public override bool writeRecord(FileListerFileInfo file)
            {
                /* If all available sheets are full, make a new one. */
                if (numWritten % EXCEL_MAX_ROWS == 0)
                {
                    conn.Close();
                    conn.Open();
                    createNextSheet();
                }
                /* Count this record as written. */
                numWritten++;
                /* Get all of the properties of the FileListerFileInfo record and add
                 * them to the parameters of the insert query. */
                PropertyInfo[] properties = typeof(FileListerFileInfo).GetProperties();
                for (int i = 0; i < insertComm.Parameters.Count; i++)
                    insertComm.Parameters[i].Value = properties[i].GetValue(file, null);
                /* Add the record. */
                insertComm.ExecuteNonQuery();

                return true;
            }

EDIT:

No, I do not use Excel at all. I'm intentionally avoiding Interop.Excel due to its poor performance (at least from my dabbles with it).

A: 

Instead of writing one record at a time, can you find a way to insert in a Bulk capacity? I try not to use crazy DataSet stuff, but isn't there a way to make all your inserts happen local first and then make them go up in one fell swoop? Does this processes open up Excel in the background? Do these processes die afterwards?

Charles Graham
EDIT:No, I do not use Excel at all. I'm intentionally avoiding Interop.Excel due to its poor performance (at least from my dabbles with it).
llamaoo7
+2  A: 

The answer is Yes, the formula you describe does equal a bad time.

If you have a database handy (SQL Server or Access are good for this), you can do all of your inserts into a database table, and then export the table all at once into an Excel spreadsheet.

Generally speaking, databases are good at handling lots of inserts, while spreadsheets aren't.

MusiGenesis
After reading this response (and linking my boss to about a dozen reasons why excel is bad for this), I've convinced management to not rely on it. I already had an array of output options so this isn't critical.
llamaoo7
+1  A: 

Here are a couple of ideas:

Is the target workbook open? There is a bug (Memory leak occurs when you query an open Excel worksheet by using ActiveX Data Objects) which IIRC is actually in the OLE DB provider for Jet (which you are using) although this isn't confirmed in the above article.

Regardless, bulk insert would seem to be the way to go.

You could use the same Jet OLE DB provider to do this: all you need is a one row table. You could even fabricate one on the fly. To create a new Excel workbook, execute CREATE TABLE DDL using a non-existent xls file in the connection string and the provider will create the workbook for you with a worksheet to represent the table. You have a connection to your Excel workbook so you could execute this:

CREATE TABLE [EXCEL 8.0;DATABASE=C:\MyFabricatedWorkbook;HDR=YES].OneRowTable 
(
   x FLOAT
);

(Even better IMO would be to fabricate a Jet database i.e. .mdb file).

Use INSERT to create a dummy row:

INSERT INTO [EXCEL 8.0;DATABASE=C:\MyFabricatedWorkbook;HDR=YES].OneRowTable (x) 
   VALUES (0);

Then, still using your connection to your target workbook, you could use something similar to the following to create a derived table (DT1) of your values to INSERT in one hit:

INSERT INTO MyExcelTable (key_col, data_col)
SELECT DT1.key_col, DT1.data_col
FROM (
   SELECT 22 AS key_col, 'abc' AS data_col
   FROM [EXCEL 8.0;DATABASE=C:\MyFabricatedWorkbook;HDR=YES].OneRowTable
   UNION ALL
   SELECT 55 AS key_col, 'xyz' AS data_col
   FROM [EXCEL 8.0;DATABASE=C:\MyFabricatedWorkbook;HDR=YES].OneRowTable
   UNION ALL
   SELECT 99 AS key_col, 'efg' AS data_col
   FROM [EXCEL 8.0;DATABASE=C:\MyFabricatedWorkbook;HDR=YES].OneRowTable
) AS DT1;
onedaywhen
I understand the method you are suggesting but why would this way avoid memory usage problems?
llamaoo7
Because you said, "The memory used by the process seems to keep growing as I execute INSERT queries", it would follow that one INSERT query would, if not actually cure the disease, at least minimize the symptoms.
onedaywhen