views:

453

answers:

1

Hello,

I use OpenXML SDK 2.0 to generate Excel file with large amount of data, appox. 1000000 rows, and I need to optimize memory usage because my machine slow down very fast.

I want to solve this issue by flushing part of generated DOM tree into file in runtime. I make my own buffering for data. E.g I have 100000 records to write and I want flush stream into file when I add 1000 rows into Excel worksheet. I make this by using method worksheetPart.Worksheet.Save(). Documantation says taht this method Save(): "saves the data in the DOM tree back to the part. It could be called multiple times as well. Each time it is called, the stream will be flushed. "

         foreach (Record m in dataList)
         {
            Row contentRow = CreateContentRow(index, m);         // my own method to create row content

            //Append new row to sheet data.
            sheetData.AppendChild(contentRow);

            if (index % BufferSize == 0)
            {
                worksheetPart.Worksheet.Save();
            }

            index++;

        }

This method works because memory usage chart has saw shape but unfortunetly the memory uasge grows in time.

Do anyone have any idea how solve this issue?

A: 

SpreadsheetGear for .NET can create an xlsx workbook with 1,000,000 rows by 40 columns of random numbers (that's 40 million cells) in 74 seconds (that includes creating the workbook in memory from random numbers and saving to disk on an overclocked Intel QX 6850 and Windows Vista 32).

What kind of performance are you seeing with the Open XML SDK?

You can download a free trial of SpreadsheetGear here and try it yourself.

I will past the code to generate the 40 million cell workbook below.

Disclaimer: I own SpreadsheetGear LLC

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using SpreadsheetGear;

namespace ConsoleApplication10
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                // Run once with 100 rows and then run forever with 1,000,000 rows.
                for (int rows = 100; rows <= 1000000; rows = 1000000)
                {
                    Console.Write("rows={0}, ", rows);
                    var startMemory = System.GC.GetTotalMemory(true);
                    var timer = System.Diagnostics.Stopwatch.StartNew();
                    var workbook = BuildWorkbook(rows);
                    var usedMemory = System.GC.GetTotalMemory(true) - startMemory;
                    Console.WriteLine("usedMemory={0}, time={1} seconds, workbook.Name={2}", usedMemory, timer.Elapsed.TotalSeconds, workbook.Name);
                    workbook = null;
                }
            }
            catch (Exception e)
            {
                Console.WriteLine("got exception={0}", e.Message);
            }
        }

        static IWorkbook BuildWorkbook(int rows)
        {
            var workbook = Factory.GetWorkbook();
            var worksheet = workbook.Worksheets[0];
            var values = (SpreadsheetGear.Advanced.Cells.IValues)worksheet;
            Random rand = new Random();
            int cols = 40;
            for (int col = 0; col < cols; col++)
            {
                for (int row = 0; row <= rows; row++)
                {
                    values.SetNumber(row, col, rand.NextDouble());
                }
            }
            workbook.SaveAs(string.Format(@"c:\tmp\Rows{0}.xlsx", rows), FileFormat.OpenXMLWorkbook);
            return workbook;
        }
    }
}
Joe Erickson
Thank you for your answer. I will check if SpreadsheetGear help me to solve my issue.I desribe my Open XML SDK performaance problem on this post:http://blog.goyello.com/2009/08/25/read-before-using-it-open-xml-sdk-performance-analysis/