views:

948

answers:

4

What proven design patterns exist for batch operations on resources within a REST style web service?

I'm trying to be strike a balance between ideals and reality in terms of performance and stability. We've got an API right now where all operations either retrieve from a list resource (ie: GET /user) or on a single instance (PUT /user/1, DELETE /user/22, etc).

There are some cases where you want to update a single field of a whole set of objects. It seems very wasteful to send the entire representation for each object back and forth to update the one field.

In an RPC style API, you could have a method:

/mail.do?method=markAsRead&messageIds=1,2,3,4... etc.

What's the REST equivalent here? Or is it ok to compromise now and then. Does it ruin the design to add in a few specific operations where it really improves the performance, etc? The client in all cases right now is a Web Browser (javascript application on the client side).

+3  A: 

Not at all -- I think the REST equivalent is (or at least one solution is) almost exactly that -- a specialized interface designed accommodate an operation required by the client.

I'm reminded of a pattern mentioned in Crane and Pascarello's book Ajax in Action (an excellent book, by the way -- highly recommended) in which they illustrate implementing a CommandQueue sort of object whose job it is to queue up requests into batches and then post them to the server periodically.

The object, if I remember correctly, essentially just held an array of "commands" -- e.g., to extend your example, each one a record containing a "markAsRead" command, a "messageId" and maybe a reference to a callback/handler function -- and then according to some schedule, or on some user action, the command object would be serialized and posted to the server, and the client would handle the consequent post-processing.

I don't happen to have the details handy, but it sounds like a command queue of this sort would be one way to handle your problem; it'd reduce the overall chattiness substantially, and it'd abstract the server-side interface in a way you might find more flexible down the road.


Update: Aha! I've found a snip from that very book online, complete with code samples (although I still suggest picking up the actual book!). Have a look here, beginning with section 5.5.3:

This is easy to code but can result in a lot of very small bits of traffic to the server, which is inefficient and potentially confusing. If we want to control our traffic, we can capture these updates and queue them locally and then send them to the server in batches at our leisure. A simple update queue implemented in JavaScript is shown in listing 5.13. [...]

The queue maintains two arrays. queued is a numerically indexed array, to which new updates are appended. sent is an associative array, containing those updates that have been sent to the server but that are awaiting a reply.

Here are two pertinent functions -- one responsible for adding commands to the queue (addCommand), and one responsible for serializing and then sending them to the server (fireRequest):

CommandQueue.prototype.addCommand = function(command)
{ 
    if (this.isCommand(command))
    {
     this.queue.append(command,true);
    }
}

CommandQueue.prototype.fireRequest = function()
{
    if (this.queued.length == 0)
    { 
     return; 
    }

    var data="data=";

    for (var i = 0; i < this.queued.length; i++)
    { 
     var cmd = this.queued[i]; 
     if (this.isCommand(cmd))
     {
      data += cmd.toRequestString(); 
      this.sent[cmd.id] = cmd;

            // ... and then send the contents of data in a POST request
     }
    }
}

That ought to get you going. Good luck!

Christian Nunciato
Thanks. That's very similar to my ideas on how I would go forward if we kept the batch operations on the client. The issue is the round-trip time for performing an operation on a large number of objects.
Mark Renouf
Hm, ok -- I thought you wanted to perform the operation on a large number of objects (on the server) by way of a lightweight request. Did I misunderstand?
Christian Nunciato
Yes, but I don't see how that code sample would perform the operation any more efficiently. It batches up requests but still sends them to the server one at a time. Am I misinterpreting?
Mark Renouf
Christian Nunciato
A: 

I would be tempted in an operation like the one in your example to write a range parser.

It's not a lot of bother to make a parser that can read "messageIds=1-3,7-9,11,12-15". It would certainly increase efficiency for blanket operations covering all messages and is more scalable.

Good observation and a good optimization, but the question was whether this style of request could ever be "compatible" with the REST concept.
Mark Renouf
Hi, yeah I understand. The optimisation does make the concept it more RESTful and I didn't want to leave out my advice just because it was wandering a small way from topic.
+1  A: 

You seem to be asking two distinct questions. One regarding batch operations and the other regarding partial updates.

The most "standards compliant" way to implement batch operations is by using HTTP pipelining.

With regard to your particular example of partial updates then I would suggest thinking about it from the perspective of you are adding the message to collection of "read messages". Something like:

POST /Mailbox/ReadMessages?message=23

If you really want a way of marking multiple at once, without using pipelining, then you need to choose a content type that you can post that would indicate a list of resources.

One possible option would be to use an ATOM feed document. The client could build an ATOM feed document that points to a collection of messages and then POST that to /Mailbox/ReadMessages. The server could parse the ATOM feed and determine which messages need to be marked as read.

Darrel Miller
+2  A: 

A simple RESTful pattern for batches is to make use of a collection resource. For example, to delete several messages at once.

DELETE /mail?&id=0&id=1&id=2

It's a little more complicated to batch update partial resources, or resource attributes. That is, update each markedAsRead attribute. Basically, instead of treating the attribute as part of each resource, you treat it as a bucket into which to put resources. One example was already posted. I adjusted it a little.

POST /mail?markAsRead=true
POSTDATA: ids=[0,1,2]

Basically, you are updating the list of mail marked as read.

You can also use this for assigning several items to the same category.

POST /mail?category=junk
POSTDATA: ids=[0,1,2]

It's obviously much more complicated to do iTunes-style batch partial updates (e.g., artist+albumTitle but not trackTitle). The bucket analogy starts to break down.

POST /mail?markAsRead=true&category=junk
POSTDATA: ids=[0,1,2]

In the long run, it's much easier to update a single partial resource, or resource attributes. Just make use of a subresource.

POST /mail/0/markAsRead
POSTDATA: true

Alternatively, you could use parameterized resources. This is less common in REST patterns, but is allowed in the URI and HTTP specs. A semicolon divides horizontally related parameters within a resource.

Update several attributes, several resources:

POST /mail/0;1;2/markAsRead;category
POSTDATA: markAsRead=true,category=junk

Update several resources, just one attribute:

POST /mail/0;1;2/markAsRead
POSTDATA: true

Update several attributes, just one resource:

POST /mail/0/markAsRead;category
POSTDATA: markAsRead=true,category=junk

The RESTful creativity abounds.

Alex