views:

76

answers:

3

I have at my disposal a REST service that accepts a JSON array of image urls, and will return scaled thumbnails.

Problem

I want to batch up image URLs sent by concurrent clients before calling the REST service. Obviously if I receive 1 image, I should wait a moment in case other images trickle in. I've settled on a batch of 5 images. But the question is, how do I design it to take care of these scenarios:

  1. If I receive x images, such that x < 5, how do I timeout from waiting if no new images will arrive in the next few minutes.
  2. If I use a queue to buffer incoming image urls, I will probably need to lock it to prevent clients from concurrently writing while I'm busy reading my batches of 5. What data structure is good for this ? BlockingQueue ?
A: 

Well what you could do is have the clients send a special string to the queue, indicating that it is done sending image URLs. So if your last element in the queue is that string, you know that there are no URLs left.

If you have multiple clients and you know the number of clients you can always count the amount of the indicators in the queue to check if all of the clients are finished.

Marc Müller
A: 

1- As example, if your Java web app is running on Google AppEngine, you could write each client request in the datastore, have cron job (i.e. scheduled task in GAE speak) read the datastore, build a batch and send it.

2- For the concurrency/locking aspect, then again you could rely on GAE datastore to provide atomicity.

Of course feel free to disregard my proposal if GAE isn't an option.

jldupont
+1  A: 

The data structure is not what's missing. What's missing is an entity - a Timer task, I'd say, which you stop and restart every time you send a batch of images to your service. You do this whether you send them because you had 5 (incidentally, I assume that 5 is just your starting number and it'll be configurable, along with your timeout), or whether because the timeout task fired.

So there's two entities running: a main thread which receives requests, queues them, checks queue depth, and if it's 5 or more, sends the oldest 5 to the service (and restarts the timer task); and the timer task, which picks up incomplete batches and sends them on.

Side note: that main thread seems to have several responsibilities, so some decomposition might be in order.

CPerkins