views:

31

answers:

2

I'm using SQLBULKCOPY to copy some data-tables into a database table, however, because the size of the files I'm copying run sometimes in excess of 600mb, I keep running out of memory.

I'm hoping to get some advice about managing the table size before I commit it to the database so I can free up some memory to continue writing.

Here are some examples of my code (some columns and rows eliminated for simplicity)

            SqlBulkCopy sqlbulkCopy = new SqlBulkCopy(ServerConfiguration); //Define the Server Configuration
        System.IO.StreamReader rdr = new System.IO.StreamReader(fileName);

        Console.WriteLine("Counting number of lines...");
        Console.WriteLine("{0}, Contains: {1} Lines", fileName, countLines(fileName));

        DataTable dt = new DataTable();

        sqlbulkCopy.DestinationTableName = "[dbo].[buy.com]"; //You need to define the target table name where the data will be copied
        dt.Columns.Add("PROGRAMNAME");
        dt.Columns.Add("PROGRAMURL");
        dt.Columns.Add("CATALOGNAME");

        string inputLine = "";
        DataRow row; //Declare a row, which will be added to the above data table

        while ((inputLine = rdr.ReadLine()) != null) //Read while the line is not null
            {
                i = 0;
                string[] arr;

                Console.Write("\rWriting Line: {0}", k);
                arr = inputLine.Split('\t'); //splitting the line which was read by the stream reader object (tab delimited)
                row = dt.NewRow();
                row["PROGRAMNAME"] = arr[i++];
                row["PROGRAMURL"] = arr[i++];
                row["CATALOGNAME"] = arr[i++];
                row["LASTUPDATED"] = arr[i++];
                row["NAME"] = arr[i++];
                dt.Rows.Add(row);
                k++;
        }

        // Set the timeout, 600 secons (10 minutes) given table size--damn that's a lota hooch
        sqlbulkCopy.BulkCopyTimeout = 600;
        try
        {
            sqlbulkCopy.WriteToServer(dt);
        }
        catch (Exception e)
        {
            Console.WriteLine(e);
        }
        sqlbulkCopy.Close();//Release the resources
        dt.Dispose();

        Console.WriteLine("\nDB Table Written: \"{0}\" \n\n", sqlbulkCopy.DestinationTableName.ToString());

    }
+1  A: 

Try specifying the BatchSize property to 1000 which will batch up the insert in a 1000 record batch rather than the whole lot. You can tweak this value to find what is optimal. I have used sqlbulkcopy for similar size data and it works well.

Fadrian Sudaman
Thanks, I added this and starting writing every 100k records, I think my problem is with the file reader object because I run out of memory at about the same place each time-- trying to figure out how I could keep my spot in the file as I'm reading straight through right now...
jordan.baucke
I still dont think the filestream is the issue. Did you try batching size of about 1000? Apparently batching at about 1000-2000 record is most effective. If you are certain that it is the file, what you can do is open the file, read a thousand record, store the position do a bulk insert, then close the file. Open it again and set the position to the last pos and read another batch.
Fadrian Sudaman
You must be right, because I have the actual copy in a "try" block, and I get an out of memory exception, so it must be related to something in the try block otherwise I wouldn't be dealing with it properly. Ill try saving my file position and report the results.
jordan.baucke
A: 

I continued to have problems getting SQLBulkCopy to work, and I realized I needed to do more work on each record before it was entered into the database, so I developed a simple LinQ to Sql method to do record by record updates, so I could edit other information and create more record information as it was being run,

Problem: This method's been running pretty slow (even on Core i3 machine), any ideas on how to speed it up (threading?) -- on a single processor core, with 1gb of memory it crashes or takes sometimes 6-8 hours to write the same amount of data as one SQLBulkCopy that takes a few moments. It does manage memory better though.

            while ((inputLine = rdr.ReadLine()) != null) //Read while the line is not null
        {
            Console.Write("\rWriting Line: {0}", k);
            string[] arr;              
            arr = inputLine.Split('\t');

            /* items */
            if (fileName.Contains(",,"))
            {
                Item = Table(arr);
               table.tables.InsertOnSubmit(Item);

                /* Check to see if the item is in the db */
                bool exists = table.tables.Where(u => u.ProductID == Item.ProductID).Any();

                /* Commit */
                if (!exists)
                {
                    try
                    {
                        table.SubmitChanges();
                    }
                    catch (Exception e)
                    {
                        Console.WriteLine(e);
                        // Make some adjustments.
                        // ...
                        // Try again.
                        table.SubmitChanges();
                    }
                }
            }

With helper method:

    public static class extensionMethods
{
    /// <summary>
    /// Method that provides the T-SQL EXISTS call for any IQueryable (thus extending Linq).
    /// </summary>
    /// <remarks>Returns whether or not the predicate conditions exists at least one time.</remarks>
    public static bool Exists<TSource>(this IQueryable<TSource> source, Expression<Func<TSource, bool>> predicate)
    {
        return source.Where(predicate).Any();
    }
}
jordan.baucke