views:

3815

answers:

4

I'm trying to create a generic function to help me select thousands of records using LINQ to SQL from a local list. SQL Server (2005 at least) limits queries to 2100 parameters and I'd like to select more records than that.

Here would be a good example usage:

var some_product_numbers = new int[] { 1,2,3 ... 9999 };

Products.SelectByParameterList(some_product_numbers, p => p.ProductNumber);

Here is my (non-working) implementation:

public static IEnumerable<T> SelectByParameterList<T, PropertyType>(Table<T> items, 

IEnumerable<PropertyType> parameterList, Expression<Func<T, PropertyType>> property) where T : class
{
    var groups = parameterList
     .Select((Parameter, index) =>
      new
      {
       GroupID = index / 2000, //2000 parameters per request
       Parameter
      }
     )
     .GroupBy(x => x.GroupID)
     .AsEnumerable();

    var results = groups
    .Select(g => new { Group = g, Parameters = g.Select(x => x.Parameter) } )
    .SelectMany(g => 
     /* THIS PART FAILS MISERABLY */
     items.Where(item => g.Parameters.Contains(property.Compile()(item)))
    );

    return results;
}

I have seen plenty of examples of building predicates using expressions. In this case I only want to execute the delegate to return the value of the current ProductNumber. Or rather, I want to translate this into the SQL query (it works fine in non-generic form).

I know that compiling the Expression just takes me back to square one (passing in the delegate as Func) but I'm unsure of how to pass a parameter to an "uncompiled" expression.

Thanks for your help!

** EDIT: Let me clarify further:

Here is a working example of what I want to generalize:

var local_refill_ids = Refills.Select(r => r.Id).Take(20).ToArray();

var groups = local_refill_ids
 .Select((Parameter, index) =>
  new
  {
   GroupID = index / 5, //5 parameters per request
   Parameter
  }
 )
 .GroupBy(x => x.GroupID)
 .AsEnumerable();

var results = groups
.Select(g => new { Group = g, Parameters = g.Select(x => x.Parameter) } )
.SelectMany(g => 
 Refills.Where(r => g.Parameters.Contains(r.Id))
)
.ToArray()
;

Results in this SQL code:

SELECT [t0].[Id], ... [t0].[Version]
FROM [Refill] AS [t0]
WHERE [t0].[Id] IN (@p0, @p1, @p2, @p3, @p4)

... That query 4 more times (20 / 5 = 4)
+1  A: 

LINQ-to-SQL still works via standard SQL parameters, so writing a fancy expression isn't going to help. There are 3 common options here:

  • pack the ids into (for example) csv/tsv; pass down as a varchar(max) and use a udf to split it (at the server) into a table variable; join to the table variable
  • use a table-valued-parameter in SQL Server 2008
  • have a table on the server that you could push the ids into (perhaps via SqlBulkCopy) (perhaps with a "session guid" or similar); join to this table

The first is the simplest; getting a "split csv udf" is trivial (just search for it). Drag the udf onto the data-context and consume from there.

Marc Gravell
This isn't necessary. See my answer below.
ifwdev
+3  A: 

I've come up with a way to chunk the query into pieces - i.e. you give it 4000 values, so it might do 4 requests of 1000 each; with full Northwind example. Note that this might not work on Entity Framework, due to Expression.Invoke - but is fine on LINQ to SQL:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;

namespace ConsoleApplication5 {
    /// SAMPLE USAGE
    class Program {
        static void Main(string[] args) {
            // get some ids to play with...
            string[] ids;
            using(var ctx = new DataClasses1DataContext()) {
                ids = ctx.Customers.Select(x => x.CustomerID)
                    .Take(100).ToArray();
            }

            // now do our fun select - using a deliberately small
            // batch size to prove it...
            using (var ctx = new DataClasses1DataContext()) {
                ctx.Log = Console.Out;
                foreach(var cust in ctx.Customers
                        .InRange(x => x.CustomerID, 5, ids)) {
                    Console.WriteLine(cust.CompanyName);
                }
            }
        }
    }

    /// THIS IS THE INTERESTING BIT
    public static class QueryableChunked {
        public static IEnumerable<T> InRange<T, TValue>(
                this IQueryable<T> source,
                Expression<Func<T, TValue>> selector,
                int blockSize,
                IEnumerable<TValue> values) {
            MethodInfo method = null;
            foreach(MethodInfo tmp in typeof(Enumerable).GetMethods(
                    BindingFlags.Public | BindingFlags.Static)) {
                if(tmp.Name == "Contains" && tmp.IsGenericMethodDefinition
                        && tmp.GetParameters().Length == 2) {
                    method = tmp.MakeGenericMethod(typeof (TValue));
                    break;
                }
            }
            if(method==null) throw new InvalidOperationException(
                "Unable to locate Contains");
            foreach(TValue[] block in values.GetBlocks(blockSize)) {
                var row = Expression.Parameter(typeof (T), "row");
                var member = Expression.Invoke(selector, row);
                var keys = Expression.Constant(block, typeof (TValue[]));
                var predicate = Expression.Call(method, keys, member);
                var lambda = Expression.Lambda<Func<T,bool>>(
                      predicate, row);
                foreach(T record in source.Where(lambda)) {
                    yield return record;
                }
            }
        }
        public static IEnumerable<T[]> GetBlocks<T>(
                this IEnumerable<T> source, int blockSize) {
            List<T> list = new List<T>(blockSize);
            foreach(T item in source) {
                list.Add(item);
                if(list.Count == blockSize) {
                    yield return list.ToArray();
                    list.Clear();
                }
            }
            if(list.Count > 0) {
                yield return list.ToArray();
            }
        }
    }
}
Marc Gravell
+2  A: 

Easiest way to do this: Use LINQKit (Free, non-restrictive license)

Working version of code:

public static IEnumerable<T> SelectByParameterList<T, PropertyType>(this Table<T> items, IEnumerable<PropertyType> parameterList, Expression<Func<T, PropertyType>> propertySelector, int blockSize) where T : class
{
    var groups = parameterList
     .Select((Parameter, index) =>
      new
      {
       GroupID = index / blockSize, //# of parameters per request
       Parameter
      }
     )
     .GroupBy(x => x.GroupID)
     .AsEnumerable();

    var selector = LinqKit.Linq.Expr(propertySelector);

    var results = groups
    .Select(g => new { Group = g, Parameters = g.Select(x => x.Parameter) } )
    .SelectMany(g => 
     /* AsExpandable() extension method requires LinqKit DLL */
     items.AsExpandable().Where(item => g.Parameters.Contains(selector.Invoke(item)))
    );

    return results;
}

Example usage:

    Guid[] local_refill_ids = Refills.Select(r => r.Id).Take(20).ToArray();

    IEnumerable<Refill> results = Refills.SelectByParameterList(local_refill_ids, r => r.Id, 10); //runs 2 SQL queries with 10 parameters each

Thanks again for all your help!

ifwdev
I'd be interested in what the TSQL for that does, compared to my InRange reply...
Marc Gravell
SELECT [t0].[Id], ... [t0].[Version]FROM [Refill] AS [t0]WHERE [t0].[Id] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9)... That query 2times (20 / 10 = 2)
ifwdev
A: 

ifwdev,

How would you use the linqkit solution in vb.net?

Chris A
http://www.carlosag.net/Tools/CodeTranslator/
ifwdev