views:

133

answers:

3

Hi all,

Hope some of you can give some pointers on this one.

I generate code where i have to make calls to remote resources like webservices or databases.

Consider this piece of code

class Parent{    
    IEnumerable<Child> Children;

    int SumChildren() {  
        // note the AsParallel
        return Children.AsParallel().Sum(c => c.RemoteCall()); 
    }   
}

class Child {        
    public int RemoteCall() {
        // call some webservice. I'd like to pool these calls 
        // without having to rewrite the rest of this code
    } 
}

For 50 children, it's going to make 50 calls to the service, taking the overhead 50 times. In my real life examples this could easily be a million calls, bringing the whole thing to a crawl.

What i would like to do is batch these calls in some way that is transparent for the calling thread/task. So instead of calling the service directly, it calls some central queue (the ' train station') that batches these calls.

So that when it does that, the calling task blocks. Then the queue waits for X calls to accumulate and then makes 1 call to the remote service with a list of requests.

WHen the result comes this queue returns the return values to the right task and unblocks it. for the calling thread, all this remains hidden and it looks like just another function call.

Can this be done? Are there primitives in the TPL that will let me do this?

It kinda smells like the CCR with lots of things going on at the same time waiting for other stuff to complete.

I could of course rewite this code to make the list of requests on the Parent class and then call the service. The thing is that with my real problem all this code is generated. So I would have to 'look inside' the implementation of Child.RemoteCall, making this all a whole lot more complicated than it already is. Also the the Child could be a proxy to a remote object etc. Would be very hard if doable at all, i'd rather isolate this complexity.

Hope this make sense to someone, if not let me know i'll elaborate.

THanks in advance,

John

+1  A: 

So that when it does that, the calling task blocks. Then the queue waits for X calls to accumulate

If the queue receives x calls (x < X) then the calling task will block until another task pushes the total >= X. If you only have one task that wants to make N * x calls, it will get stuck.

If your application usually has a lot of tasks running, then you might only see this problem intermittently - where you have unusually low load, or a clean shutdown.

You could solve this by adding a time out, so that the queue will send the batched requests anyway if no requests have been added within a time limit, and/or the first request has been waiting longer than a time limit.

I could of course rewrite this code to make the list of requests on the Parent class and then call the service.

Perhaps you are on the right track with this approach. Could you find a way of replacing the generated method implementation with a hand-coded implementation, by delegation, inheritance, lambda method, or enhancing your generator?


... with my real problem all this code is generated.

One point that I'm not quite clear on is which parts of the code are generated (hard to modify) and which parts of the code can be modified to solve this problem?

  1. Child.RemoteCall()
  2. Parent.SumChildren()
  3. Neither of the above.

If it's neither of the above then you have to be able to modify something in order to solve the problem. Are the Parent and Child instances built by an AbstractFactory? If so, then it might be possible to insert a proxy to the Child instances that can be used to modify the non-functional aspects of their behavior(s).

richj
Hi Rich,Yeah, obviously the queue should fire anyway after 100ms or so, so it won't block. I left that out for brevity because it didn;t realy touch the essence of the problem.Rewriting would be very difficult. The internal implementation of Child.RemoteMember may have memoization, so it wouldn't actually have to make the call. Or it could be a IChild, a proxy to a remote object. Wich is not known at compiletime.Those are just 2 reasons why trying to rewrite compiletime is going to be harder than what i was thinking of. So I still want to see if this can be done. Any ideas?GJ
gjvdkamp
This would make an interesting question on its own because it's an example of an abstraction that has been too successful in hiding the non-functional aspects of the implementation (remoting and performance). If the child instance has a pre-computed answer available locally it doesn't make sense to process the request remotely. Similarly there is no advantage to batching these requests. If the queue could detect these and evaluate them immediately (without batching), then that would only leave the remote requests to be batched.
richj
I found this link: http://msdn.microsoft.com/en-us/magazine/cc163340.aspx interesting. I'm not sure if Tasks and Futures are quite right for this problem. They might help but they're not a perfect fit. The task manager defaults to one thread per core - but with remote requests, some of the threads will be blocked, so multiple threads per core can give a better throughput. The number of cores on the server is possibly more relevant than the number on the client.
richj
One more thought - is it possible to short-cut the calculation for the parent at the database level, instead of dividing the work between the children?
richj
Added extra paragraph in main answer.
richj
Hi, sorry for my late reply email notifications not working properly. In my code there are two trees intersecting: one is the tree of objects, the other tree are the expression trees of functions that call eachother. Saying this i'm beginning to realize i might be barking up the wrong tree here.When i do CallServiceX() + CallServiceY(), with blocking approach i suggested here the call to ServiceY() would only be done AFTER ServiceX returns. So this hardcoded approach will not paralellize that well, i would run serially. I would need to rework the code to do out of process data calls.
gjvdkamp
Damn these comment fields are pretty lousy for a discussion....Thanks, GJ
gjvdkamp
Hi Rich,Thanks for helping me. I have a rant about my current idea at Hassan's answer, but helped me at least that much. Thanks GJ.
gjvdkamp
A: 

Hi Rich,

(using the answer box for space)

Thanks for the thoughts.

"This would make...": I deal with code that is generated from a repository. When dealing with handcoded example of this problem a developer could spot this and improve it. With generating code it's pretty hard to distill the general case from a set of examples of the problem. This can get pretty complicated so my strategy is devide and conquer. If i have to look inside the function of child that goes out the door.

"I found this link...": I've looked at Futures, but that's more of a forking mechanism that can get parallelized when there's an idle thread.

The TPL seems to be about splitting work up into small bits. What i want to do is take some of these bits and put them together for a while in a different composition and then split them back up again for parallel execution. (I think, still chewing on this one mentally...)

"One more thought": Again, the devide and conquer strategy is what let me get this far. So i devide the larger problem up into little bits, solve them and then put the bits back together. I like to think of an ant colony where each ant follows simple rules (or kanban, that's a similar principle), as opposed to some central management thing (query optimizer) that gets bogged down quickly because it gets very complex very fast.

when the parent could just call the 50 children in a parallelizable fashion, and then these seperate tasks could get batched together just because they're pointing to the same remote resource that would be great.

The major hurdle here is how do i let the calling task (or thread or whatever, a unit of execution) block, have another one pickup the work for a batch of them, do it, put the answer into the collection where all the task put their work and them wake them up again. (and then in an efficient manner..).

I think i remember George Chrysanthakopoulos (The guy that created the CCR) say something that the yield return statement is what he used to to such a thing. I'll try to find that interview again on Channel9.

Regards GJ

gjvdkamp
The major hurdle: if the children are remote objects then sending the requests in batches to a remote server (ideally running in the same server process that the children came from) means that the results will come back in a batch when they have been calculated. The calling thread will block until the result comes back, at which point it will continue processing. This is not the only solution but it maps reasonably well onto the description in the "major hurdle" paragraph.
richj
Exactly, this is the problem i have: how do i efficiently let a lot of unfinished 'stackframes' lay around waiting for the batch to come back and how do i put the answers back on their stack and let them run again.I'm busy with other stuff now so i can't prototype this yet...thanks, GJ
gjvdkamp
+3  A: 

You are scratching at the surface of massively parallel programming. You need to think in a concurrency oriented way. You are starting 51 Jobs, and not not the 50 jobs that you need to batch. The extra job is the one that manages the 50 jobs. In terms of the primitives required you need.

JOBHANDLE X= GetJobId();
//single job
AddJob(JOBHANLDE X,ChildJob y);
//container of jobs
AddJobs(JOBHANDLE x, ChildJobs Y);

BeginAsyncExecute(JOBHANDLE X);
WaitTillResult(JOBHANDLE X);

You need an engine in the background that defines blocking primitives (beyond those provided by an OS kernel) and that manages worker threads and jobs to execute, which from the looks of it is handled by the PLINQ technology. PLINQ also uses green threads which is good.

You have mentioned you will have a mix of databases and web-servers. Therefore your Job process/function will have to map the children to the correct resources before the batch is executed. 50 Children might therefore be reduced to far fewer batch-able RPC calls.

So you build up your Job batch and then you block on it.

Getting more specific will be hard. but in light of the discussion so far please tell me what you are having troubles with.

Hassan Syed
I like this method having seen a similar solution in a project I have worked on. I think this will also give you flexibility in future if you have to change how/what you want to do before/during the remote call
Amit Wadhwa
Hi Hassan,Thanks for your insights! I have to apologize in advance that i'm very busy with other stuff so can't give this dicussion my fullest attention. The blocking primitives are what i'm looking for i think, although there is another issue that Richj helped me to. I have a tree of objects and then i have expression trees on those. My idea for fast runtime code was to have the expression trees compiled to native code. This however leads to a problem because then execution of the expression trees will always be depth firstunless i rewrite the code.
gjvdkamp
I think Futures might actually be a good place to start my search, because they allow for parallization before the actual return value is used, the work needs to be exposed to the framework before the result is required. Also they smell like the 'blocking primitive beyond the OS' you mention.So ServiceCallX() + ServiceCallY() would become something like.Future<double) X {ServiceCallX};Future<double> X {ServiceCallY};X + Y;This nicely breaks up all the calls to the remote services into seperate bits, nice. But now how do I cluster them into batches share the overhead of remote calls?
gjvdkamp
Maybe i'll just go with a in process data structure for now where i suck up all the actuals beforehand so i don't have to make these high latency calls..
gjvdkamp
Come to think of it, that is the answer, i need to expose all the calls to a framework BEFORE i need the return value. This framework then returns a handle that i can call when i need the return value.The framework accepts the calls and puts them into a collection and returns a reference to the request. Now the the calling thread can carry on what it was doing. The framework worker thread loops over the items and batches them, makes the calls and puts the replies into the collection.
gjvdkamp
When the calling thread calls ReturnValue on the collection item, it returns the return value, or blocks until the workerthread has put the return value there.I'll how the TPL (Px) handles a lot of blocking threads to see how 'green' they really are.RichJ and Hassan, MAJOR thanks here for helping me get the problem straight!Thanks,GJ
gjvdkamp