views:

579

answers:

3

When I write a whole table into an excel worksheet, I know to work with a whole Range at once instead of writing to individual cells. However, is there a way to specify format as I'm populating the array I'm going to export to Excel?

Here's what I do now:

object MissingValue = System.Reflection.Missing.Value;
Excel.Application excel = new Excel.Application();
int rows = 5;
int cols = 5;
int someVal;

Excel.Worksheet sheet = (Excel.Worksheet)excel.Workbooks.Add(MissingValue).Sheets[1];
Excel.Range range = sheet.Range("A1", sheet.Cells(rows,cols));
object[,] rangeData = new object[rows,cols];
for(int r = 0; r < rows; r++)
{
    for(int c = 0; c < cols; c++)
    {
        someVal = r + c;
        rangeData[r,c] = someVal.ToString();
    }
}
range.set_Value(MissingValue, rangeData);

Now suppose that I want some of those numbers to be formatted as percentages. I know I can go back on a cell-by-cell basis and change the formatting, but that seems to defeat the whole purpose of using a single Range.set_Value() call. Can I make my rangeData[,] structure include formatting information, so that when I call set_Value(), the cells are formatted in the way I want them?

To clarify, I know I can set the format for the entire Excel.Range object. What I want is to have a different format specified for each cell, specified in the inner loop.

A: 

You can apply a formatting on the range, and then populate it with values you cannot specify formatting in you object[,] array

Stan R.
A: 

You apply the formatting to each individual cell within the inner loop via for(int r = 0; r < rows; r++) { for(int c = 0; c < cols; c++) { Excel.Range r2 = sheet.Cells( r, c ); r2.xxxx = ""; } }

Once you have r2, you can change the cell format any way you want.

JDMX
I realize that will get the job done, but I'm trying to keep the number of calls into the Excel Interop to an absolute minimum. Applying formatting in the inner loop would work, but would be dog slow.
Drew Shafer
A: 

So here's the best "solution" I've found so far. It isn't the nirvanna I was looking for, but it's much, much faster than setting the format for each cell individually.

// 0-based indexes
static string RcToA1(int row, int col)
{
    string toRet = "";
    int mag = 0;
    while(col >= Math.Pow(26, mag+1)){mag++;}
    while (mag>0)
    {
        toRet += System.Convert.ToChar(64 + (byte)Math.Truncate((double)(col/(Math.Pow(26,mag)))));
        col -= (int)Math.Truncate((double)Math.Pow(26, mag--));
    }
    toRet += System.Convert.ToChar(65 + col);
    return toRet + (row + 1).ToString();
}

static Random rand = new Random(DateTime.Now.Millisecond);
static string RandomExcelFormat()
{
    switch ((int)Math.Round(rand.NextDouble(),0))
    {
        case 0: return "0.00%";
        default: return "0.00";
    }
}


struct ExcelFormatSpecifier
{
    public object NumberFormat;
    public string RangeAddress;
}

static void DoWork()
{
    List<ExcelFormatSpecifier> NumberFormatList = new List<ExcelFormatSpecifier>(0);

    object[,] rangeData = new object[rows,cols];
    for(int r = 0; r < rows; r++)
    {
        for(int c = 0; c < cols; c++)
        {
            someVal = r + c;
            rangeData[r,c] = someVal.ToString();
            NumberFormatList.Add(new ExcelFormatSpecifier
                {
                    NumberFormat = RandomExcelFormat(),
                    RangeAddress = RcToA1(rowIndex, colIndex)
                });
        }
    }
    range.set_Value(MissingValue, rangeData);

    int max_format = 50;
    foreach (string formatSpecifier in NumberFormatList.Select(p => p.NumberFormat).Distinct())
    {
        List<string> addresses = NumberFormatList.Where(p => p.NumberFormat == formatSpecifier).Select(p => p.RangeAddress).ToList();
        while (addresses.Count > 0)
        {
            string addressSpecifier = string.Join(",",     addresses.Take(max_format).ToArray());
            range.get_Range(addressSpecifier, MissingValue).NumberFormat = formatSpecifier;
            addresses = addresses.Skip(max_format).ToList();
        }
    }
}

Basically what is happening is that I keep a list of the format information for each cell in NumberFormatList (each element also holds the A1-style address of the range it applies to). The original idea was that for each distinct format in the worksheet, I should be able to construct an Excel.Range of just those cells and apply the format to that range in a single call. This would reduce the number of accesses to NumberFormat from (potentially) thousands down to just a few (however many different formats you have).

I ran into an issue, however, because you apparently can't construct a range from an arbitrarily long list of cells. After some testing, I found that the limit is somewhere between 50 and 100 cells that can be used to define an arbitrary range (as in range.get_Range("A1,B1,C1,A2,AA5,....."). So once I've gotten the list of all cells to apply a format to, I have one final while() loop that applies the format to 50 of those cells at a time.

This isn't ideal, but it still reduces the number of accesses to NumberFormat by a factor of up to 50, which is significant. Constructing my spreadsheet without any format info (only using range.set_Value()) takes about 3 seconds. When I apply the formats 50 cells at a time, that is lengthened to about 10 seconds. When I apply the format info individually to each cell, the spreadsheet takes over 2 minutes to finish being constructed!

Drew Shafer
Although I accepted my own answer, what I've decided is that using Excel automation to write files is a bad plan. Going forward I'm using the ExcelPackage library to directly write .xlsx files
Drew Shafer