views:

633

answers:

2

I am running an import that will have 1000's of records on each run. Just looking for some confirmation on my assumptions:

Which of these makes the most sense:

  1. Run SaveChanges() every AddToClassName() call.
  2. Run SaveChanges() every n number of AddToClassName() calls.
  3. Run SaveChanges() after all of the AddToClassName() calls.

The first option is probably slow right? Since it will need to analyze the EF objects in memory, generate SQL, etc.

I assume that the second option is the best of both worlds, since we can wrap a try catch around that SaveChanges() call, and only lose n number of records at a time, if one of them fails. Maybe store each batch in an List<>. If the SaveChanges() call succeeds, get rid of the list. If it fails, log the items.

The last option would probably end up being very slow as well, since every single EF object would have to be in memory until SaveChanges() is called. And if the save failed nothing would be committed, right?

+4  A: 

If you need to import thousands of records, I'd use something like SqlBulkCopy, and not the Entity Framework for that.

marc_s
I hate it when people don't answer my question :) Well, let's say I "need" to use EF. What then?
SkippyFire
Well, if you really *MUST* use EF, then I would try to commit after a batch of say 500 or 1000 records. Otherwise, you'll end up using too much resources, and a failure would potentially roll back all 99999 rows you've updated when the 100000th one fails.
marc_s
With the same issue, I ended by using SqlBulkCopy which is way more performant that EF in that case.Although I don't like to use several ways to access database.
Julien N
+3  A: 

I would test it first to be sure. Performance doesn't have to be that bad.

If you need to enter all rows in one transaction, call it after all of AddToClassName class. If rows can be entered independently, save changes after every row. Database consistence is important.

Second option I don't like. It would be confusing for me (from final user perspective) if I made import to system and it would decline 10 rows out of 1000, just because 1 is bad. You can try to import 10 and if it fails, try one by one and then log.

Test if it takes long time. Don't write 'propably'. You don't know it yet. Only when it is actually a problem, think about other solution (marc_s).

EDIT

I've done some tests (time in miliseconds):

10000 rows:

SaveChanges() after 1 row:18510,534
SaveChanges() after 100 rows:4350,3075
SaveChanges() after 10000 rows:5233,0635

50000 rows:

SaveChanges() after 1 row:78496,929
SaveChanges() after 500 rows:22302,2835
SaveChanges() after 50000 rows:24022,8765

So it is actually faster to commit after n rows than after all.

My recommendation is to:

  • SaveChanges() after n rows.
  • If one commit fails, try it one by one to find faulty row.

Test classes:

TABLE:

CREATE TABLE [dbo].[TestTable](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [SomeInt] [int] NOT NULL,
    [SomeVarchar] [varchar](100) NOT NULL,
    [SomeOtherVarchar] [varchar](50) NOT NULL,
    [SomeOtherInt] [int] NULL,
 CONSTRAINT [PkTestTable] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Class:

public class TestController : Controller
{
    //
    // GET: /Test/
    private readonly Random _rng = new Random();
    private const string _chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

    private string RandomString(int size)
    {
        var randomSize = _rng.Next(size);

        char[] buffer = new char[randomSize];

        for (int i = 0; i < randomSize; i++)
        {
            buffer[i] = _chars[_rng.Next(_chars.Length)];
        }
        return new string(buffer);
    }


    public ActionResult EFPerformance()
    {
        string result = "";

        TruncateTable();
        result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(10000, 1).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 100 rows:" + EFPerformanceTest(10000, 100).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 10000 rows:" + EFPerformanceTest(10000, 10000).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(50000, 1).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 500 rows:" + EFPerformanceTest(50000, 500).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 50000 rows:" + EFPerformanceTest(50000, 50000).TotalMilliseconds + "<br/>";
        TruncateTable();

        return Content(result);
    }

    private void TruncateTable()
    {
        using (var context = new CamelTrapEntities())
        {
            var connection = ((EntityConnection)context.Connection).StoreConnection;
            connection.Open();
            var command = connection.CreateCommand();
            command.CommandText = @"TRUNCATE TABLE TestTable";
            command.ExecuteNonQuery();
        }
    }

    private TimeSpan EFPerformanceTest(int noOfRows, int commitAfterRows)
    {
        var startDate = DateTime.Now;

        using (var context = new CamelTrapEntities())
        {
            for (int i = 1; i <= noOfRows; ++i)
            {
                var testItem = new TestTable();
                testItem.SomeVarchar = RandomString(100);
                testItem.SomeOtherVarchar = RandomString(50);
                testItem.SomeInt = _rng.Next(10000);
                testItem.SomeOtherInt = _rng.Next(200000);
                context.AddToTestTable(testItem);

                if (i % commitAfterRows == 0) context.SaveChanges();
            }
        }

        var endDate = DateTime.Now;

        return endDate.Subtract(startDate);
    }
}
LukLed
The reason I wrote "probably" is that I made an educated guess. To make it more clear that "I'm not sure", I made it into a question. Also, I think it makes complete sense to think about potential problems BEFORE I run into them. That is the reason I asked this question. I was hoping someone would know which method would be most efficient, and I could go with that, right off the bat.
SkippyFire
Awesome dude. Exactly what I was looking for. Thank you for taking to time to test this! I'm guessing that I can store each batch in memory, try the commit, and then if it fails go through each one individually as you said. Then once that batch is done, release the references to those 100 items so they can be cleared out of memory. Thanks again!
SkippyFire
Memory will not be freed, because all objects will be held by ObjectContext, but having 50000 or 100000 in context doesn't take much space these days.
LukLed