I have written a C# library which has a method to count words from multiple passages of text in parrallel. The passages of text are given as character streams where there is a random delay each time getnextchar()
is called. My library method has to take an array of these character streams and return a combined word-frequency count. To do this I have a safely shared word-frequency data structure and a thread to read each character stream and update the shared collection. When all the threads have completed then I return the data structure to the client application.
The client application needs interim results of the combined word counting every 10 seconds. To do this I use a delegate to call back the client every 10 seconds with the results untill all of the worker threads have completed, after which I return the final results to the client.
My problem is that when I callback the client with the interim results I have to lock my shared data structure and wait for the client application to return from the callback before I can un-lock it. Whilst the callback is executing all of the worker threads are blocked waiting for the lock on the data structure. This doesnt seem like a sensible thing to do, because I dont think I should rely or trust the client code to return promptly or even at all. However they only other way I can think of doing it which does not rely on the client code is to make a copy or snapshot of my data structure and pass that to the client through the callback. This is at the expense of memory and computation but once the copy is made the workers can continue updating the shared collection and the callback can do whatever it wants.
My question is two fold:
1) Which is the lesser of two evils, allowing the possibility of a bad client callback implementation to block the workers, or periodically performing an expensive operation.
2)Is there a way to solve this problem which doesnt do either of the above?