ansaurus

Question

Answer 1

A:

you could try putting the creation of the datacontext outside of the for loop, not sure how much memory that would save you though.

Maybe you can call GC.Collect() after each loop, see if you can cause garbage collection manually.

John Boker 2009-11-10 02:26:01

I think if anything that makes it worse, as the data context then lives long enough to build up thousands of cached records. But I'm clearly misunderstanding something as something is building up nonetheless.

Nestor 2009-11-10 02:32:17

@maxc - Have you at least attempted to move the datacontext call outside the loop just to see if it makes any difference?

Breadtruck 2009-11-10 02:47:25

i have, yes. been tearing my hair out periodically on this one for a year now.

Nestor 2009-11-10 02:54:49

Answer 2

+2 A:

You are going to the database twice for every loop iteration - once to retrieve the row, then again to update the row. This isn't very efficient.

You should operate in batches:

Get a set of rows up front by selecting on a range rather than a single value, ie 0-100 for the first batch, 101-200 for the next batch, and so on. This will be fastest if you have a clustered index defined on the Code column.
Create the data context prior to entering the loop
Inside the loop, just update the objects
Call SubmitChanges() AFTER the loop has finished, this will send all updates to the database in a single connection / transaction
Repeat for the next batch

You should make the batch size configurable, as you can't be sure what batch size will yield the best performance - don't hardcode it into the application.

Also, I would use SingleOrDefault() with null-checking instead of Single(), unless you can guarantee that there will always be a row for any value of i.

EDIT:

In terms of memory usage, that's much harder to control, but it's not peculiar to LINQ to SQL, any batch processing algorithm has to deal with this. While I don't recommend using GC.Collect() in practice, it is usually sufficient as a workaround after processing a large batch.

You could also look at reducing the amount of data that you retrieve per row (depending on how much this is to begin with). You can create a new entity that maps to a much smaller set of columns from the same table, potentially just one or two, so that when you select that entity you are only retrieving the columns you intend to work with to begin with. This would improve both speed AND memory footprint as less data travels over the wire, and objects are much smaller.

Sam 2009-11-10 02:38:11

Sure, they're all good speed optimisations but optimisation ought to come after the program has been made to work. At the moment all available memory quickly gets eaten up so that it can't continue at all.

Nestor 2009-11-10 02:43:06

+1 for "Call SubmitChanges() AFTER the loop has finished"

Breadtruck 2009-11-10 02:45:38

And calling SubmitChanges() after the loop has finished will do what exactly for memory utilisation? We aren't talking about speed here.

Nestor 2009-11-10 02:56:24

@Sam, I think you're still missing the point somewhat here. The memory usage ideas you mention could reduce the memory used per iteration, but what's happening is a massive and rapid buildup of memory use over successive iterations. Reducing the amount of data per row isn't going to fix that. The problem I'm seeing certainly is peculiar to Linq to SQL, in that entities that are no longer needed are not being freed up between iterations.

Nestor 2009-11-10 04:48:04

@maxc - all I can suggest now is that you fire up a profiler and see where the memory is being consumed / held on to.

Sam 2009-11-10 05:25:22

Answer 3

+3 A:

You are putting pressure on the data context to generate the query from scratch every time.

Try using a compiled query instead.

leppie 2009-11-10 04:19:20

Switching over to compiled queries has made all the difference, reducing memory consumption by at least 95%. I guess continually recompiling queries is a bad idea anyway but I'm still left curious about why it used up so much memory.For the benefit of others, here's a guide to getting started with compiled queries: http://linqinaction.net/blogs/jwooley/archive/2007/09/04/linq-to-sql-compiled-queries.aspx

Nestor 2009-11-10 06:42:44

Compiled queries are not drop-in replacements for non-compiled ones, you have to change the code around a bit. This article helped me understand why that is:http://linqinaction.net/blogs/jwooley/archive/2007/09/04/linq-to-sql-compiled-queries.aspx

Nestor 2009-11-10 06:43:24

Answer 4

A:

I wasn't able to replicate the problem. Memory useage was flat. Slow performance, but constant memory.

Are you sure you're not leaking elsewhere? Can you produce a minimal code sample that reproduces the problem?

Edit:

I used the virtually the same sample code:

for (int ii = 1; ii < 200000; ii++)
{
    using (var dc = new PlayDataContext())
    {
        var record = 
            (from r in dc.T1s where r.Id == ii select r).SingleOrDefault();
        if (record != null)
        {
            record.Name = "S";
            dc.SubmitChanges();
        }
    }
}

without issue.

So things to rule out:

Framework version. I'm on latest.
DataContext/entity complexity. My test table is just two fields, and Id (int) and a Name (nvarchar(max)).

Can you reproduce the problem with the latest FW, with a small sample DataContext?

Scott Weinstein 2009-11-10 05:06:52

I agree that I think we would need to see more of your code to diagnose the problem. I've used LTS in big giant loops like this too and haven't yet ran into this "memory leak" problem you mention...

Funka 2009-11-10 05:20:46

Here's a complete sample for you, though of course I haven't been able to include the data context itself. On my machine, it consumes more than 1MB per second and is using 720MB as I write this. for (int run = 0; run < 1000; run++) for (int i = 0; i < 50000; i++) { using (MyDC myDC = new MyDC()) { tblRecords record = (from r in myDC.tblRecords where r.Code == i select r).SingleOrDefault(); } }

Nestor 2009-11-10 07:01:34

ansaurus

tags:

views:

answers:

Processing large datasets using LINQ

related questions