tags:

views:

630

answers:

6

I'd like some advice on designing a REST API which will allow clients to add/remove large numbers of objects to a collection efficiently.

Via the API, clients need to be able to add items to the collection and remove items from it, as well as manipulating existing items. In many cases the client will want to make bulk updates to the collection, e.g. adding 1000 items and deleting 500 different items. It feels like the client should be able to do this in a single transaction with the server, rather than requiring 1000 separate POST requests and 500 DELETEs.

Does anyone have any info on the best practices or conventions for achieving this?

My current thinking is that one should be able to PUT an object representing the change to the collection URI, but this seems at odds with the HTTP 1.1 RFC, which seems to suggest that the data sent in a PUT request should be interpreted independently from the data already present at the URI. This implies that the client would have to send a complete description of the new state of the collection in one go, which may well be very much larger than the change, or even be more than the client would know when they make the request.

Obviously, I'd be happy to deviate from the RFC if necessary but would prefer to do this in a conventional way if such a convention exists.

+1  A: 

You should use AtomPub. It is specifically designed for managing collections via HTTP. There might even be an implementation for your language of choice.

dowski
Actually, AtomPub only defines adding single items to a collection, and removing them. So I wouldn't call that "specifically designed for managing collections".
Julian Reschke
A: 

For the POSTs, at least, it seems like you should be able to POST to a list URL and have the body of the request contain a list of new resources instead of a single new resource.

Hank Gay
+1  A: 

As far as I understand it, REST means REpresentational State Transfer, so you should transfer the state from client to server.

If that means too much data going back and forth, perhaps you need to change your representation. A collectionChange structure would work, with a series of deletions (by id) and additions (with embedded full xml Representations), POSTed to a handling interface URL. The interface implementation can choose its own method for deletions and additions server-side.

The purest version would probably be to define the items by URL, and the collection contain a series of URLs. The new collection can be PUT after changes by the client, followed by a series of PUTs of the items being added, and perhaps a series of deletions if you want to actually remove the items from the server rather than just remove them from that list.

Phil H
A: 

You could introduce meta-representation of existing collection elements that don't need their entire state transfered, so in some abstract code your update could look like this:

{existing elements 1-100}
{new element foo with values "bar", "baz"}
{existing element 105}
{new element foobar with values "bar", "foo"}
{existing elements 110-200}

Adding (and modifying) elements is done by defining their values, deleting elements is done by not mentioning it the new collection and reordering elements is done by specifying the new order (if order is stored at all).

This way you can easily represent the entire new collection without having to re-transmit the entire content. Using a If-Unmodified-Since header makes sure that your idea of the content indeed matches the servers idea (so that you don't accidentally remove elements that you simply didn't know about when the request was submitted).

Joachim Sauer
A: 

You might want to think of the change task as a resource in itself. So you're really PUT-ing a single object, which is a Bulk Data Update object. Maybe it's got a name, owner, and big blob of CSV, XML, etc. that needs to be parsed and executed. In the case of CSV you might want to also identify what type of objects are represented in the CSV data.

List jobs, add a job, view the status of a job, update a job (probably in order to start/stop it), delete a job (stopping it if it's running) etc. Those operations map easily onto a REST API design.

Once you have this in place, you can easily add different data types that your bulk data updater can handle, maybe even mixed together in the same task. There's no need to have this same API duplicated all over your app for each type of thing you want to import, in other words.

This also lends itself very easily to a background-task implementation. In that case you probably want to add fields to the individual task objects that allow the API client to specify how they want to be notified (a URL they want you to GET when it's done, or send them an e-mail, etc.).

Jamie Flournoy
+2  A: 

Yes, PUT creates/overwrites, but does not partially update.

If you need partial update semantics, use PATCH. See http://greenbytes.de/tech/webdav/draft-dusseault-http-patch-14.html.

Julian Reschke