views:

808

answers:

13

I always thought it did, though I don't know where I got the idea... I'd always assumed it was easy to treat a spreadsheet as a 2D array for instance, but some searching on SO suggests everyone is using 3rd-party libraries? Or, are all those people needing a solutions which doesn't require Office to be installed... if I don't have this restriction does it get any easier?

As I say, I looked on SO but didn't see an answer to this particular question. However maybe my searching skills just suck...

EDIT: The idea is I want to open an XLS/CSV doc in a C# app. I don't want any complex data from the cells, simply being able to read the text value from each cell. Ideally a wrapper that treats the spreadsheet like a 2D array of cells, with a Cell.getText() method, is all the complexity I need.

A: 

No, .NET doesn't have built-in Office functionality.

Jon Grant
+2  A: 

Visual Basic is (or at least was) the better of the .NET languages for working with Office applications, though you can use C# as well.

Programming Office Applications.

Josh Smeaton
Just to clarify, VB can be "better" because it supports optional parameters and late binding, whereas C# (pre-4.0) does not support either of these in a seamless way. The upcoming version of C# should narrow--if not eliminate--this gap.
Adam Robinson
Ah thank you, I knew there was a difference I just didn't know why.
Josh Smeaton
+2  A: 

Not sure what you want to do exactly, but there is an entire site for this to work with Office in Visual Studio.

Understanding the Excel Object Model from a Visual Studio 2005 Developer's Perspective

You may need the Visual Studio Tools for Office.

Jeff O
VSTO is for building Office add-ins rather than opening/manipulating office documents in a .net app, IIRC.
Will
@Will - what is the difference between: document-level customizations and application-level add-ins? Both capabilities are available.
Jeff O
The distinction I was making is creating a plugin for Office or opening a .doc in a .net application. If you want to open an xls and read data out of it like a two dimension array (spreadsheet[0][1]), you can't use VSTO to do it.
Will
+7  A: 

Yes, .NET has built-in Office functionality. But you'll be beating yourself up trying to use it. It's also well hidden and only compatible with Office 2007 and later (unless you download the compatibility addin for Office 2003/XP).

Better to use some of the APIs designed to interoperate with Office than trying to go it alone. Link is for the Office Open XML SDK which can be used to create the .*x Office files (.docx, .xslx, etc).


If you're wondering how System.IO.Packaging relates to Office, the document trail starts here:

http://msdn.microsoft.com/en-us/library/dd371623%28VS.85%29.aspx

The shorthand version is that the new office formats are Open XML documents. What are Open XML documents? They are packages of resources (e.g., images) and XML files contained in a ZIP file. You can take any of the new Office files, change the extension to .zip, and open them up for a look-see.

What does this mean? It means that you can unzip these files, load up the parts as XDocuments and go to town. Of course, you have to unzip the files into a temporary location, sort through the multiple XML files to find the ones you want, manage all the connections between files when you alter them, etc etc etc.

Or, you could use the System.IO.Packaging namespace and its types to open these files, access the different components within the packages (even remotely), alter them, and flush your changes back to disk.

Now, while you can easily do this using the namespace, you don't have type safe access to the different packages within Open XML files. You have to use magic strings to get parts out. This also means you pretty much have to know the Open XML schema, which sucks.

That's why MS has provided the Open XML SDK, which you can use in combination with System.IO.Packaging to open, alter, and save Open XML office documents.

Add my first link with my second link and you get an answer to the original question.


To answer the OP's clarification, it's not going to be that easy. xls documents are complex; cells aren't just a 2D array. But there are free API's out there to help you open and access the data in them.

If you're looking to open Office 2007 compatible files, I'd strongly suggest checking out the Office Open XML SDK. If you're looking to open older versions (Office 2003, XP), I'd suggest using one of the many projects for Excel over on codeplex.com (I think I've used the Excel Data Reader). There are quite a few of them, designed to make accessing data in Excel spreadsheets pretty easy. But not quite sheet[x][y] easy.

Will
Are you sure about the first link, "Yes, .NET has built-in Office functionality"?. The title for that page is "System.IO.Packaging Namespace". If it is correct: how is it related to Microsoft Office?
Peter Mortensen
I said it was hidden. The packaging namespace is used to create the new brand of office documents (xlsx etal). Its a bit more than I can explain here, but if you look into it a little you'll see how it all fits together.
Will
This is totally misleading and unhelpful. It's a shame SO seems to work on the basis that the more rep you have the more upvotes you get, blindly, without people even reading the answer. Saying that the Packaging namespace is Office functionality is about as true as saying that PKZIP contains Office functionality.
Jon Grant
Well, since you're clueless as to what's actually in the packaging namespace, I can understand where you're coming from. However, since I also linked to the interop assemblies (which, of course, are based on System.Packaging but offer type-safe access to package parts), you can't possibly claim my answer isn't helpful. In fact, it kind of makes you look like a dick.
Will
A: 

You need to install Visual Studio Tools for Office.

awe
A: 

I find it easier to save the office files as XML documents and then parse and manipulate the XML than to dwell in the depths of Office Interop. I find myself saying that a lot. Your mileage may vary. The XML formats make for larger files, but it could very well be worth it for the ease of use.

quillbreaker
A: 

Have you tried looking at Visual Studio Tools for Office? It is a full managed API for dealing with all the Office apps.

There is a fair bit of dev work going on using VSTO, judging by forums etc. Here is a helper API for dealing with VSTO, using LINQ etc:

VSTO Power Tools

geejay
+2  A: 

I think Excel Data Reader will help. You can read Microsoft Excel files ('97-2007) directly into a Data Set.

Here's some sample code (from the website):

FileStream stream = File.Open(filePath, FileMode.Open, FileAccess.Read);

//1. Reading from a binary Excel file ('97-2003 format; *.xls)
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
//...
//2. Reading from a OpenXml Excel file (2007 format; *.xlsx)
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
//...
//3. DataSet - The result of each spreadsheet will be created in the result.Tables
DataSet result = excelReader.AsDataSet();
//...
//4. DataSet - Create column names from first row
excelReader.IsFirstRowAsColumnNames = true;
DataSet result = excelReader.AsDataSet();

//5. Data Reader methods
while (excelReader.Read())
{
    //excelReader.GetInt32(0);
}

//6. Free resources (IExcelDataReader is IDisposable)
excelReader.Close();
Matt Warren
A: 

John,

It depends on the environment. We do almost exactly the same thing with our app. We were using the Interop up until recently. It's not too difficult to use as long as you don't leave COM components lying around.

Unfortunately the downside is Office 2007 doesn't support any sort of server automation (without nasty hacks to the OS), so if you're running in a non-UI environment/automated task then you won't be able to use the Interop.

You can use things such as ADO I believe but it doesn't have the same concept of cells so it didn't work for what we needed.

Alternatively there are some good libraries, checkout my review of them. FlexCel was our preffered choice which we have purchased. Cheap, fast and very easy to use.

Ian
What sort of nasty hacks to the OS?
RichardOD
http://stackoverflow.com/questions/863864/excel-2007-automation-on-top-of-a-windows-server-2008-x64
Ian
+4  A: 

If you want to open older format .XLS (Excel 97-2003) files, as opposed to the newer *.XLSX, you could try the JET Provider:

OleDbConnection con = new OleDbConnection(string.Format(
                 "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1\"",
                 "filename.xls"
                 ));
con.Open();
OleDbDataAdapter ad = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", con);
DataTable t = new DataTable();
ad.Fill(t);

This puts the data into a DataTable, where it is fairly easy to manipulate.

benrwb
Yes- this technique is sometimes useful.
RichardOD
Didn't know about that trick.
Joe
That's pretty neat, and since most people still seem to use .XLS rather than .XLSX, perhaps preferable. May have a look at this, does anyone have a link explaining why/how it works?
John
A: 

Actually, I would put it precisely the other way round... So Office still doesn't have .Net functionality? It would be nice to have the alternative of building Macros in .Net rather than being stuck to VBA.

I know I can do the equivalent from an Office extensibility project in Visual Studio, but I am talking about something that would be more "native" to Office itself. For some scenarios, doing an extensibility project is such a big overkill.

Rui Craveiro
A: 

Try using http://www.codeplex.com/xlslinq

The following example finds Worksheets by Worksheet's name.

using(XlsWorkbook book = new XlsWorkbook("TestData\\100.xls")) {

    var sheets = from s in book.Worksheets
                 where s.Name == "100"
                 select s;

    foreach(var sheet in sheets) Console.WriteLine(sheet.Name);
}
mcintyre321
A: 

The interface towards Office is not that hard to use. Small Excel example here (below) - opening the worksheet, parsing through the entire thing looking for a specific "hit" with RegExp:

    internal void OpenSearchAndReplace(string path, Logger log)
    {
        object nullobj = System.Reflection.Missing.Value;
        ConfigurationManager conf = new ConfigurationManager();

        try
        {
            if (_excelApp == null)
                _excelApp = new Excel.Application();

            Excel.Workbook book = _excelApp.Workbooks.Open(path, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj,
                                                           nullobj, nullobj, nullobj, nullobj,
                                                           nullobj, nullobj, nullobj, nullobj);
            Excel.Worksheet worksheet;
            if( book.Worksheets.Count > 1 )
                worksheet = (Excel.Worksheet)book.Worksheets.get_Item("Sheet1");
            else
                worksheet = (Excel.Worksheet)book.ActiveSheet;

            Excel.Range range = worksheet.UsedRange;

            object[,] values = (object[,])range.Value2;

            for (int row = 1; row <= values.GetUpperBound(0); ++row)
            {
                for (int col = 1; col <= values.GetUpperBound(1); ++col)
                {
                    string value = Convert.ToString(values[row, col]);
                    if (Regex.IsMatch(value, @conf.GetFullyQualifiedRegExp()))
                    {
                        range.Cells.set_Item(row, col, conf.GetReplacementText());
                    }
                }
            }
            book.Save();
            Marshal.ReleaseComObject(worksheet );

            log.LogExcelFile( "File " + path + " has been processed\n" );
        }
        catch (Exception ex)
        {...}

But please - remember to force the Garbagecollector if you do not want a lot of objects hanging around :):

            // Force the garbagecollector to kill objects. 
            // Waiting for it to finish
            GC.Collect();
            GC.WaitForPendingFinalizers();
sonstabo