tags:

views:

99

answers:

4

Hey all,

I'm building an offline C# application that will import data off spread sheets and store them in a SQL Database that I have created (Inside the Project). Through some research I have been able to use some code that can import a static table, into a Database that is exactly the same layout as the columns in the worksheet

What I"m looking to do is have specific columns go to their correct tables based on name. This way I have the database designed correctly and not just have one giant table to store everything.

Below is the code I'm using to import a few static fields into one table, I want to be able to split the imported data into more than one.

What is the best way to do this?

public partial class Form1 : Form
    {
        string strConnection = ConfigurationManager.ConnectionStrings
        ["Test3.Properties.Settings.Test3ConnectionString"].ConnectionString;

        public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {


            //Create connection string to Excel work book
            string excelConnectionString =
            @"Provider=Microsoft.Jet.OLEDB.4.0;
            Data Source=C:\Test.xls;
            Extended Properties=""Excel 8.0;HDR=YES;""";

            //Create Connection to Excel work book
            OleDbConnection excelConnection = new OleDbConnection(excelConnectionString);

            //Create OleDbCommand to fetch data from Excel
            OleDbCommand cmd = new OleDbCommand
            ("Select [Failure_ID], [Failure_Name], [Failure_Date], [File_Name], [Report_Name], [Report_Description], [Error] from [Failures$]", excelConnection);

            excelConnection.Open();
            OleDbDataReader dReader;
            dReader = cmd.ExecuteReader();

            SqlBulkCopy sqlBulk = new SqlBulkCopy(strConnection);
            sqlBulk.DestinationTableName = "Failures";
            sqlBulk.WriteToServer(dReader);

        }
+1  A: 

You can try an ETL (extract-transform-load) architecture:

Extract: One class will open the file and get all the data in chunks you know how to work with (usually you take a single row from the file and parse its data into a POCO object containing fields that hold pertinent data), and put those into a Queue that other work processes can take from. In this case, maybe the first thing you do is have Excel open the file and re-save it as a CSV, so you can reopen it as basic text in your process and chop it up efficiently. You can also read the column names and build a "mapping dictionary"; this column is named that, so it goes to this property of the data object. This process should happen as fast as possible, and the only reason it should fail is because the format of a row doesn't match what you're looking for given the structure of the file.

Transform: Once the file's contents have been extracted into an instance of a basic row, perform any validation, calculations or other business rules necessary to turn a row from the file into a set of domain objects that conform to your domain model. This process can be as complex as you need it to be, but again it should be as straightforward as you can make it while obeying all the business rules given in your requirements.

Load: Now you've got an object graph in your own domain objects, you can use the same persistence framework you'd call to handle domain objects created any other way. This could be basic ADO, an ORM like NHibernate or MSEF, or an Active Record pattern where objects know how to persist themselves. It's no bulk load, but it saves you having to implement a completely different persistence model just to get file-based data into the DB.

An ETL workflow can help you separate the repetitive tasks into simple units of work, and from there you can identify the tasks that take a lot of time and consider parallel processes.

Alternately, you can take the file and massage its format by detecting columns you want to work with, and arranging them into a format that matches your bulk input spec, before calling a bulk insert routine to process the data. This file processor routine can do anything you want it to, including separating data into several files. However, it's one big process that works on a whole file at a time and has limited opportunities for optimization or parallel processing. However, if your loading mechanism is slow, or you've got a LOT of data that is simple to digest, it may end up faster than even a well-designed ETL.

In any case, I would get away from an Office format and into a plain-text (or XML) format as soon as I possibly could, and I would DEFINITELY avoid having to install Office on a server. If there is ANY way you can require the files be in some easily-parseable format like CSV BEFORE they're loaded, so much the better. Having an Office installation on a server is a Really Bad Thing in general, and OLE operations in a server app is not much better. The app will be very brittle, and anything Office wants to tell you will cause the app to hang until you log onto the server and clear the dialog box.

KeithS
Excellent answer.
Jomdom
A: 

If you are only interested in the text (not the formatting etc.), alternatively you can save the excel file as CSV file, and parse the CSV file instead, it's simple.

Louis Rhys
A: 

Depending on the lifetime of the program, I would recommend one of two options.

  1. If the program is to be short lived in use, or generally a "throw away" project, I would recommend a series of routines which parse and input data into another set of tables using standard SQL with some string processing as needed.

  2. If the program will stick around longer and/or find more use on a day-to-day basis, I would recommend implementing a solution similar to the one recommended by @KeithS. With a set of well defined steps for working with the data, much flexibility is gained. More specifically, the .NET Entity Framework would probably be a great fit. As a bonus, if you're not already well versed in this area, you might find you learn a great deal about working with data between boundaries (xls -> sql -> etc.) during your first stint with an ORM such as EF.

Jomdom
A: 

If you were looking for a more code related answer, you could use the following to modify your code to work with difficult column names / different tables:

    private void button1_Click(object sender, EventArgs e)
    {
        //Create connection string to Excel work book
        string excelConnectionString =
        @"Provider=Microsoft.Jet.OLEDB.4.0;
        Data Source=C:\Test.xls;
        Extended Properties=""Excel 8.0;HDR=YES;""";

        //Create Connection to Excel work book
        OleDbConnection excelConnection = new OleDbConnection(excelConnectionString);

        //Create OleDbCommand to fetch data from Excel
        OleDbCommand cmd = new OleDbCommand
        ("Select [Failure_ID], [Failure_Name], [Failure_Date], [File_Name], [Report_Name], [Report_Description], [Error] from [Failures$]", excelConnection);

        excelConnection.Open();

        DataTable dataTable = new DataTable();
        dataTable.Columns.Add("Id", typeof(System.Int32));
        dataTable.Columns.Add("Name", typeof(System.String));
        // TODO: Complete other table columns
        using(OleDbDataReader dReader = cmd.ExecuteReader())
        {
            DataRow dataRow = dataTable.NewRow();
            dataRow["Id"] = dReader.GetInt32(0);
            dataRow["Name"] = dReader.GetString(1);
            // TODO: Complete other table columns
            dataTable.Rows.Add(dataRow);
        }

        SqlBulkCopy sqlBulk = new SqlBulkCopy(strConnection);
        sqlBulk.DestinationTableName = "Failures";
        sqlBulk.WriteToServer(dataTable);
    }

Now you can control the names of the columns and which tables the data gets imported into. SqlBulkCopy is good for insert large amounts of data. If you only have a small amount of rows, you might be better off creating a standard data access layer to insert your records.

Duke of Muppets