views:

819

answers:

4

I've got some code that screen scrapes a website (for illustrative purposes only!)

 public System.Drawing.Image GetDilbert()
 {
  var dilbertUrl = new Uri(@"http://dilbert.com");
  var request = WebRequest.CreateDefault(dilbertUrl);
  string html;
  using (var webResponse = request.GetResponse())
  using (var receiveStream = webResponse.GetResponseStream())
  using (var readStream = new StreamReader(receiveStream, Encoding.UTF8))
   html = readStream.ReadToEnd();

  var regex = new Regex(@"dyn/str_strip/[0-9/]+/[0-9]*\.strip\.gif");
  var match = regex.Match(html);
  if (!match.Success) return null;
  string s = match.Value;
  var groups = match.Groups;
  if (groups.Count > 0)
   s = groups[groups.Count - 1].ToString(); // the last group is the one we care about

  var imageUrl = new Uri(dilbertUrl, s);
  var imageRequest = WebRequest.CreateDefault(imageUrl);
  using (var imageResponse = imageRequest.GetResponse())
  using (var imageStream = imageResponse.GetResponseStream())
  {
   System.Drawing.Image image_ = System.Drawing.Image.FromStream(imageStream, true /*useEmbeddedColorManagement*/, true /*validateImageData*/);
   return (System.Drawing.Image)image_.Clone(); // "You must keep the stream open for the lifetime of the Image."
  }
 }

Now, I would like to call GetDilbert() asynchronously. The easy way to use a delegate:

 Func<System.Drawing.Image> getDilbert;
 IAsyncResult BeginGetDilbert(AsyncCallback callback, object state)
 {
  getDilbert = GetDilbert;
  return getDilbert.BeginInvoke(callback, state);
 }
 System.Drawing.Image EndGetDilbert(IAsyncResult result)
 {
  return getDilbert.EndInvoke(result);
 }

While that certainly works, it isn't very efficient as the delegate thread will spend most of its time waiting for the two I/O operations.

What I would like to do is to call request.BeginGetResponse(), do the regex match, and then call imageRequest.BeginGetResponse(). All while using the standard async call pattern and preserving the signatures of BeginGetDilbert() and EndGetDilbert().

I've tried several approaches and haven't been completely satisfied with any of them; this seems to be a royal pain. Hence, the question. :-)


EDIT: Its seems that the approaches using iterators are frowned on by the C# compiler team.

A plea from the compiler team:

Though it is assuredly the case that you CAN use iterators to implement state machines, poor-mans coroutines, and so on, I wish people would not do so.

Please use tools for the purposes for which they were intended. If you want to write state machines, write yourself a library that is designed specifically to solve that general problem and then use it.

Using tools for purposes other than what they were intended for is "clever", and clever is bad; clever is hard for maintenance programemrs to understand, clever is hard to extend, clever is hard to reason about, clever makes people think "out of the box"; there's good stuff in that box.


Going with the Future<> answer because that stays in C# which is the same as my sample code. Unfortunately, neither the TPL nor F# are officially supported by Microsoft...yet.

+3  A: 

It is kind of a nightmare to get this right. You need to create callbacks to pass into each 'Begin' method that then run the 'continuation' of the method. (And don't forget to ensure all the exception-handling and CompletedSynchronously logic is correct!) When you author this in C# today, your code turns into a hopeless mess of spaghetti, but that's about the only way you can achieve your goal (not have threads blocking on I/O waits).

On the other hand, if it's within reason for your situation, F# makes this very simple and straightforward to author correctly. See this video (namely, 8 minutes starting at 52:20) for a synopsis.

EDIT

to answer Dan's comment, here is a very rough sketch... I pulled it from an email I wrote in outlook, I doubt it compiles. The exception paths are always gnarly, so be careful (what if ‘cb’ throws?); you may want to find a rock-solid AR/Begin/End implementation in C# somewhere (I dunno where, I’m sure there must be many) and use it as a model, but this shows the gist. The thing is, once you author this once, you have it for all time; BeginRun and EndRun work as the 'begin/end' on any F# async object. We have a suggestion in the F# bug database to expose the Begin/End APM on top of async in a future release of the F# library, so as to make it easier to consume F# async computations from traditional C# code. (And of course we're striving to work better with 'Task's from the parallel task library in .Net 4.0 as well.)

type AR<’a>(o,mre,result) =
    member x.Data = result
    interface IAsyncResult with
        member x.AsyncState = o
        member x.AsyncWaitHandle = mre
        member x.CompletedSynchronously = false
        member x.IsCompleted = mre.IsSignalled

let BeginRun(a : Async<’a>, cb : AsyncCallback, o : obj) =
    let mre = new ManualResetEvent(false)
    let result = ref None
    let iar = new AR(o,mre,result) :> IAsyncResult
    let a2 = async { 
        try
            let! r = a
            result := Choice2_1(r)
        with e ->
            result := Choice2_2(e)
            mre.Signal()
            if cb <> null then 
                cb.Invoke(iar)
            return () 
    }
    Async.Spawn(a2)
    iar

let EndRun<’a>(iar) =
    match iar with
    | :? AR<’a> as ar -> 
        iar.AsyncWaitHandle.WaitOne()
        match !(ar.Data) with
        | Choice2_1(r) -> r
        | Choice2_2(e) -> raise e
Brian
I've seen some of the F# stuff (as well as attempts to mimic it in C#). While it looks very nice, it doesn't preserve the standard async call pattern which is required for things like async web services.
Dan
@Brian: do you have a F# sample (perhaps with C#) which shows how to implement the standard BeginXXX()/EndXXX() pair using F#'s async features?
Dan
editted the original answer with commentary here
Brian
+3  A: 
 public Image GetDilbert()
 {
     var   dilbertUrl  = new Uri(@"http://dilbert.com");
     var   request     = WebRequest.CreateDefault(dilbertUrl);
     var   webHandle   = new ManualResetEvent(false /* nonsignaled */);
     Image returnValue = null;

     request.BeginGetResponse(ar => 
     {  
          //inside AsynchCallBack method for request.BeginGetResponse()
          var response = (HttpWebResponse) request.EndGetResponse(ar); 

          string html;  
          using (var receiveStream = response.GetResponseStream())
          using (var readStream    = new StreamReader(  receiveStream
                                                      , Encoding.UTF8))
          {
             html = readStream.ReadToEnd();
          }

          var re=new Regex(@"dyn/str_strip/[0-9/]+/[0-9]*\.strip\.gif");
          var match=re.Match(html);

          var imgHandle = new ManualResetEvent(true /* signaled  */);

          if (match.Success) 
          {   
              imgHandle.Reset();              

              var groups = match.Groups;
              var s = (groups.Count>0) ?groups[groups.Count-1].ToString()
                                       :match.Value;
              var _uri   = new Uri(dilbertUrl, s);
              var imgReq = WebRequest.CreateDefault(_uri);

              imgReq.BeginGetResponse(ar2 => 
              {  var imageRsp= (HttpWebResponse)imgReq.EndGetResponse(ar2);

                 using (var imgStream=imageRsp.GetResponseStream())
                 { 
                    var im=(Image)Image.FromStream(imgStream,true,true);
                    returnValue = (Image) im.Clone();
                 }    

                 imgHandle.Set();           
              }, new object() /*state*/);
          }      

          imgHandle.WaitOne();
          webHandle.Set();  
     }, new object() /* state */);

     webHandle.WaitOne();  
     return returnValue;      
 }

For the Begin/EndGetDilbert() methods, you can use a technique with Future<T> as described at http://blogs.msdn.com/pfxteam/archive/2008/02/29/7960146.aspx

See also http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.begingetresponse.aspx

Mark Cidade
Yes! Exactly. Chain asynchronous callbacks. I was coding up something very similar when you posted.
Jim Mischel
The problem isn't just about making GetDilbert() use asynchronous calls, but rather making BeginGetDilbert() and EndGetDilbert() still work.
Dan
Fixed the code and tested it this time.
Mark Cidade
Unfortunatley, it appears that the TPL won't ship until Visual Studio 2010; the most recent CTP is from June 2008.
Dan
+2  A: 

You might find that Jeff Richter's AsyncEnumerator simplifies things quite a bit. You can get it in the Wintellect PowerThreading library.

Joel Mueller
http://blogs.msdn.com/oldnewthing/archive/2008/08/15/8868267.aspx#8870195"A plea from the [C#] compiler team: Though ... you CAN use iterators to implement ... poor-mans coroutines, ... I wish people would not ... Using tools for [other] purposes ... is "clever", and clever is bad; ..."
Dan
A plea to the C# compiler team: Make the Async BeginXXX/EndXXX pattern suck less, then! Personally, I'm with Brian - F# is the way to go for this sort of thing.
Joel Mueller
I'm going to have to give the F# stuff a try...even though it won't be officially supported until Visual Studio 2010 ships.
Dan
+1  A: 

No question about it: use the Concurrency and Coordination Runtime. It uses many of the above noted techniques and will make your code way more concise than rolling your own.

Matt Davison