views:

320

answers:

2

I'm trying to work out in my head the best way to structure a Cocoa app that's essentially a concurrent download manager. There's a server the app talks to, the user makes a big list of things to pull down, and the app processes that list. (It's not using HTTP or FTP, so I can't use the URL-loading system; I'll be talking across socket connections.)

This is basically the classic producer-consumer pattern. The trick is that the number of consumers is fixed, and they're persistent. The server sets a strict limit on the number of simultaneous connections that can be open (though usually at least two), and opening new connections is expensive, so in an ideal world, the same N connections are open for the lifetime of the app.

One way to approach this might be to create N threads, each of which would "own" a connection, and wait on the request queue, blocking if it's empty. Since the number of connections will never be huge, this is not unreasonable in terms of actual system overhead. But conceptually, it seems like Cocoa must offer a more elegant solution.

It seems like I could use an NSOperationQueue, and call setMaxConcurrentOperationCount: with the number of connections. Then I just toss the download requests into that queue. But I'm not sure, in that case, how to manage the connections themselves. (Just put them on a stack, and rely on the queue to ensure I don't over/under-run? Throw in a dispatch semaphore along with the stack?)

Now that we're in the brave new world of Grand Central Dispatch, does that open up any other ways of tackling this? At first blush, it doesn't seem like it, since GCD's flagship ability to dynamically scale concurrency (and mentioned in Apple's recommendations on Changing Producer-Consumer Implementations) doesn't actually help me. But I've just scratched the surface of reading about it.

EDIT:

In case it matters: yes, I am planning on using the asynchronous/non-blocking socket APIs to do the actual communication with the server. So the I/O itself does not have to be on its own thread(s). I'm just concerned with the mechanics of queuing up the work, and (safely) doling it out to the connections, as they become available.

+1  A: 

If you're using CFSocket's non-blocking calls for I/O, I agree, that should all happen on the main thread, letting the OS handle the concurrency issues, since you're just copying data and not really doing any computation.

Beyond that, it sounds like the only other work your app needs to do is maintain a queue of items to be downloaded. When any one of the transfers is complete, the CFSocket call back can initiate the transfer of the next item on the queue. (If the queue is empty, decrement your connection count, and if something is added to an empty queue, start a new transfer.) I don't see why you need multiple threads for that.

Maybe you've left out something important, but based on your description the app is I/O bound, not CPU bound, so all of the concurrency stuff is just going to make more complicated code with minimal impact on performance.

Do it all on the main thread.

benzado
The app is almost certainly not CPU-bound. There will be some decoding tasks when the downloads finish up, but I can send those off to GCD to process. My main concerns were 1) keeping the UI responsive, and 2) avoiding any race conditions in dequeuing work. If the I/O callbacks all run on the main thread, that generally solved (2), but I wonder if it does so at the cost of (1). Presumably, most of that work is just going to be appending stuff to a buffer, though, and should be pretty quick.
Sixten Otto
A: 

For posterity's sake, after some discussion elsewhere, the solution I think I'd adopt for this is basically:

  • Have a queue of pending download operations, initially empty.
  • Have a set containing all open connections, initially empty.
  • Have a mutable array (queue, really) of idle open connections, initially empty.
  • When the user adds a download request:
    • If the array of idle connections is not empty, remove one and assign the download to it.
    • If there are no idle connections, but the number of total connections has not reached its limit, open a new connection, add it to the set, and assign the download to it.
    • Otherwise, enqueue the download for later.
  • When a download completes: if there are queued requests, dequeue one and give it to the connection; otherwise, add the connection to the idle list.

All of that work would take place on the main thread. The work of decoding the results of each download would be offloaded to GCD, so it can handle throttling the concurrency, and it doesn't clog the main thread.

Opening a new connection might take a while, so the process of creating a new one might be a tad more complicated in actual practice (say, enqueue the download, initiate the connection process, and then dequeue it when the connection is fully established). But I still think my perception of the possibility of race conditions was overstated.

Sixten Otto