views:

383

answers:

4

Every time I write a program of the form below using LINQ to SQL, I end up with a program that just grabs more and more memory as it runs and falls over in a heap consuming 2GB after perhaps as little as 25,000 records. I always end up rewriting it using ADO.NET. What am I doing wrong?

Clarification: This question is not about speed of processing; answers about making it go faster are of no relevance.

foreach (int i=0; i<some_big_number; i++)
{
    using (myDC dc = new myDC())  // my DataContext
    {
        myRecord record = (from r in dc.myTable where r.Code == i select r).Single();

        // do some LINQ queries using various tables from the data context
        // and the fields from this 'record'.  i carefully avoid referencing
        // any other data context than 'dc' in here because I want any cached
        // records to get disposed of when 'dc' gets disposed at the end of 
        // each iteration.

        record.someField = newValueJustCalculatedAbove;
        dc.SubmitChanges();
    }
}
A: 

you could try putting the creation of the datacontext outside of the for loop, not sure how much memory that would save you though.

Maybe you can call GC.Collect() after each loop, see if you can cause garbage collection manually.

John Boker
I think if anything that makes it worse, as the data context then lives long enough to build up thousands of cached records. But I'm clearly misunderstanding something as something is building up nonetheless.
Nestor
@maxc - Have you at least attempted to move the datacontext call outside the loop just to see if it makes any difference?
Breadtruck
i have, yes. been tearing my hair out periodically on this one for a year now.
Nestor
+2  A: 

You are going to the database twice for every loop iteration - once to retrieve the row, then again to update the row. This isn't very efficient.

You should operate in batches:

  • Get a set of rows up front by selecting on a range rather than a single value, ie 0-100 for the first batch, 101-200 for the next batch, and so on. This will be fastest if you have a clustered index defined on the Code column.

  • Create the data context prior to entering the loop

  • Inside the loop, just update the objects

  • Call SubmitChanges() AFTER the loop has finished, this will send all updates to the database in a single connection / transaction

  • Repeat for the next batch

You should make the batch size configurable, as you can't be sure what batch size will yield the best performance - don't hardcode it into the application.

Also, I would use SingleOrDefault() with null-checking instead of Single(), unless you can guarantee that there will always be a row for any value of i.

EDIT:

In terms of memory usage, that's much harder to control, but it's not peculiar to LINQ to SQL, any batch processing algorithm has to deal with this. While I don't recommend using GC.Collect() in practice, it is usually sufficient as a workaround after processing a large batch.

You could also look at reducing the amount of data that you retrieve per row (depending on how much this is to begin with). You can create a new entity that maps to a much smaller set of columns from the same table, potentially just one or two, so that when you select that entity you are only retrieving the columns you intend to work with to begin with. This would improve both speed AND memory footprint as less data travels over the wire, and objects are much smaller.

Sam
Sure, they're all good speed optimisations but optimisation ought to come after the program has been made to work. At the moment all available memory quickly gets eaten up so that it can't continue at all.
Nestor
+1 for "Call SubmitChanges() AFTER the loop has finished"
Breadtruck
And calling SubmitChanges() after the loop has finished will do what exactly for memory utilisation? We aren't talking about speed here.
Nestor
@Sam, I think you're still missing the point somewhat here. The memory usage ideas you mention could reduce the memory used per iteration, but what's happening is a massive and rapid buildup of memory use over successive iterations. Reducing the amount of data per row isn't going to fix that. The problem I'm seeing certainly is peculiar to Linq to SQL, in that entities that are no longer needed are not being freed up between iterations.
Nestor
@maxc - all I can suggest now is that you fire up a profiler and see where the memory is being consumed / held on to.
Sam
+3  A: 

You are putting pressure on the data context to generate the query from scratch every time.

Try using a compiled query instead.

leppie
Switching over to compiled queries has made all the difference, reducing memory consumption by at least 95%. I guess continually recompiling queries is a bad idea anyway but I'm still left curious about why it used up so much memory.For the benefit of others, here's a guide to getting started with compiled queries: http://linqinaction.net/blogs/jwooley/archive/2007/09/04/linq-to-sql-compiled-queries.aspx
Nestor
Compiled queries are not drop-in replacements for non-compiled ones, you have to change the code around a bit. This article helped me understand why that is:http://linqinaction.net/blogs/jwooley/archive/2007/09/04/linq-to-sql-compiled-queries.aspx
Nestor
A: 

I wasn't able to replicate the problem. Memory useage was flat. Slow performance, but constant memory.

Are you sure you're not leaking elsewhere? Can you produce a minimal code sample that reproduces the problem?

Edit:

I used the virtually the same sample code:

for (int ii = 1; ii < 200000; ii++)
{
    using (var dc = new PlayDataContext())
    {
        var record = 
            (from r in dc.T1s where r.Id == ii select r).SingleOrDefault();
        if (record != null)
        {
            record.Name = "S";
            dc.SubmitChanges();
        }
    }
}

without issue.

So things to rule out:

  • Framework version. I'm on latest.
  • DataContext/entity complexity. My test table is just two fields, and Id (int) and a Name (nvarchar(max)).

Can you reproduce the problem with the latest FW, with a small sample DataContext?

Scott Weinstein
I agree that I think we would need to see more of your code to diagnose the problem. I've used LTS in big giant loops like this too and haven't yet ran into this "memory leak" problem you mention...
Funka
Here's a complete sample for you, though of course I haven't been able to include the data context itself. On my machine, it consumes more than 1MB per second and is using 720MB as I write this. for (int run = 0; run < 1000; run++) for (int i = 0; i < 50000; i++) { using (MyDC myDC = new MyDC()) { tblRecords record = (from r in myDC.tblRecords where r.Code == i select r).SingleOrDefault(); } }
Nestor