views:

422

answers:

1

I'm in VS2008 with Entity Framework. I'm accessing objects from the database using esql for WHERE IN functionality. I'm passing a ton of IDs to the select statement so I chunk it up into sets of 800. Then I merge the results together from each chunk. My goal is to obtain results for each chunk in parallel, rather than waiting synchronously. I installed Reactive Framework and am pretty sure I need to make use of ForkJoin. However, I can't figure out how to convert this function to use it. Here's my existing code:

    public static IList<TElement> SelectWhereIn<TElement, TValue>(this ObjectContext context, string fieldName, IList<TValue> idList)
    {
        var chunkedIds = idList.Split(CHUNK_SIZE);
        string entitySetName = typeof(TElement).Name + "Set";
        var retList = new List<TElement>();
        foreach (var idChunk in chunkedIds)
        {
            string delimChunk = string.Join(",", idChunk.Select(x => x.ToString()).ToArray());
            ObjectQuery<TElement> query = context.CreateQuery<TElement>("SELECT VALUE x FROM " + entitySetName + " AS x");
            query = query.Where("it." + fieldName + " IN {" + delimChunk + "}");
            retList.AddRange(query);
        }
        return retList;
    }

Thanks!

EDIT >>> I modified the code to use Poor Man's as below:

    public static IList<TElement> SelectWhereIn<TElement, TValue>(this ObjectContext context, string fieldName, IList<TValue> idList)
    {
        var chunkedIds = idList.Split(CHUNK_SIZE);
        string entitySetName = typeof(TElement).Name + "Set";
        var chunkLists = new List<IEnumerable<TElement>>();
        Parallel.ForEach(chunkedIds, idChunk =>
        {
            string delimChunk = string.Join(",", idChunk.Select(x => x.ToString()).ToArray());
            ObjectQuery<TElement> query = context.CreateQuery<TElement>("SELECT VALUE x FROM " + entitySetName + " AS x");
            query = query.Where("it." + fieldName + " IN {" + delimChunk + "}");
            chunkLists.Add(query.ToList());
        });
        var retList = new List<TElement>();
        foreach (var chunkList in chunkLists)
        {
            retList.AddRange(chunkList);
        }
        return retList;
    }

It worked great the first time. But the second time I ran it, I got this error:

The connection was not closed. The connection's current state is connecting. Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.InvalidOperationException: The connection was not closed. The connection's current state is connecting.

Source Error:

Line 49: foreach (var iAsyncResult in resultList) Line 50: { Line 51: del.EndInvoke(iAsyncResult); Line 52: iAsyncResult.AsyncWaitHandle.Close(); Line 53: }

It's interesting, b/c Emre (the author of the library) has an edit to his original post talking about how he added those lines of code for added safety. Am i using it right? Or was his v1 safer after all?

+1  A: 

VS2010 does have that with PLINQ. Using the extensions AsParallel().WithDegreeOfParallelism(nbProcessors) would do what you need.

With VS2008, I've used Poor Man's Parallel.ForEach Iterator by Emre Aydinceren in the past when I was trying to work around a performance bottleneck, try to give it a shot.

EDIT: In reaction to the error you added, it might be a random shot in the dark, but seperate contexts for each thread ? Like so:

Parallel.ForEach(chunkedIds, idChunk =>
    {
        ObjectContext context = new MyContext(connStr);//depending what's your config
                                                       // like, with or w/o conn string

        string delimChunk = string.Join(",", idChunk.Select(x => x.ToString()).ToArray());
        ObjectQuery<TElement> query = context.CreateQuery<TElement>("SELECT VALUE x FROM " + entitySetName + " AS x");
        query = query.Where("it." + fieldName + " IN {" + delimChunk + "}");
        chunkLists.Add(query.ToList());
    });

You might have to tweak around some things (like take the connextion string from the Context extended to instantiate new Contexts).

Dynami Le Savard
Definitely getting there. It's promising. It worked the first time, but then when I ran it the second time, it crashed. Since comments can't hold code, I'll post an "answer" with the details.
Vince
Sorry, I meant an edit to my original Q...
Vince
My context is simply the Entity Framework context that I use for all my repository functions.
Vince
Oh, I get you. I'll try...
Vince
Works like a charm!!!
Vince
Uh oh... It worked at first, but now I appear to have a new problem. "The relationship between the two objects cannot be defined because they are attached to different ObjectContext objects."
Vince
I forgot to update this a few weeks ago, but suffice to say... I came to the realization that I was running down a rabbit hole by trying to do batch import/update operations via Linq to Entities instead of directly in SQL Server.
Vince