tags:

views:

49

answers:

1

I've been using Linq-to-SQL for quite awhile and it works great. However, lately I've been experimenting with using it to pull really large amounts of data and am running across some issues. (Of course, I understand that L2S may not be the right tool for this particular kind of processing, but that's why I'm experimenting - to find its limits.)

Here's a code sample:

var buf = new StringBuilder();
var dc = new DataContext(AppSettings.ConnectionString);
var records = from a in dc.GetTable<MyReallyBigTable>() where a.State == "OH" select a;
var i = 0;
foreach (var record in records) {
   buf.AppendLine(record.ID.ToString());
   i += 1;
   if (i > 3) {
      break; // Takes forever...
   }
}

Once I start iterating over the data, the query executes as expected. When stepping through the code, I enter the loop right away which is exactly what I hoped for - that means that L2S appears to be using a DataReader behind the scenes instead of pulling all the data first. However, once I get to the break, the query continues to run and pull all the rest of the records. Here are my questions for the SO community:

1.) Is there a way to stop Linq-to-SQL from finishing execution of a really big query in the middle the way you can with a DataReader?

2.) If you execute a large Linq-to-SQL query, is there a way to prevent the DataContext from filling up with change tracking information for every object returned. Basically, instead of filling up memory, can I do a large query with short object lifecycles the way you can with DataReader techniques?

I'm okay if this isn't functionality built-in to the DataContext itself and requires extending the functionality with some customization. I'm just looking to leverage the simplicity and power of Linq for large queries for nightly processing tasks instead of relying on T-SQL for everything.

+1  A: 

1.) Is there a way to stop Linq-to-SQL from finishing execution of a really big query in the middle the way you can with a DataReader?

Not quite. Once the query is finally executed the underlying SQL statement is returning a result set of matching records. The query is deferred up till that point, but not during traversal.

For your example you could simply use records.Take(3) but I understand your actual logic to halt the process might be external to SQL or not easily translatable.

You could use a combination approach by building a strongly typed LINQ query then executing it with old fashioned ADO.NET. The downside is you lose the mapping to the class and have to manually deal with the SqlDataReader results. An example of this is shown below:

var query = from c in Customers
            where c.ID < 15
            select c;

using (var command = dc.GetCommand(query))
{
    command.Connection.Open();
    using (var reader = command.ExecuteReader())
    {
        int i = 0;
        while (reader.Read())
        {
            Customer c = new Customer();
            c.ID = reader.GetInt32(reader.GetOrdinal("ID"));
            c.Name = reader.GetString(reader.GetOrdinal("Name"));
            Console.WriteLine("{0}: {1}", c.ID, c.Name);
            i++;
            if (i > 3)
                break;
        }
    }
}

2.) If you execute a large Linq-to-SQL query, is there a way to prevent the DataContext from filling up with change tracking information for every object returned.

If your intention for a particular query is to use it for read-only purposes then you could disable object tracking to increase performance by setting the DataContext.ObjectTrackingEnabled property to false:

using (var dc = new MyDataContext())
{
    dc.ObjectTrackingEnabled = false;
    // do stuff
}

You can also read this MSDN topic: How to: Retrieve Information As Read-Only (LINQ to SQL).

Ahmad Mageed