We have a discussion going on in my team at the moment, and I'd be interested in other views. Suppose we have a RESTful web service whose role is to annotate documents by applying a variety of analysis algorithms and services. The basic interaction in clear: we have a resource which is the document collection; the client POSTs a new document to the collection, gets back the URI of the new document, then can GET that docURI
to get the document back or GET {docURI}/metadata
to see the general metadata, {docURI}/ne
for named entities, etc. The problem is that some of the analyses may take a long time to complete. Suppose the client GETs the metadata URI before the analysis is complete, because it wants to be able to show partial or incremental results in the UI. Repeating the GET in future may yield more results.
Solutions we've discussed include:
- keeping the HTTP connection open until all analyses are done (which doesn't seem scalable)
- using
content-length
andaccept-range
headers to get incremental content (but we don't know in advance how long the final content will be) - providing an Atom feed for each resource so the client subscribes to update events rather than simply GETting the resource (seems overly complicated and possibly resource hungry if there are many active documents)
- just having GET return whatever is available at the time (but it still leaves the problem of the client knowing when we're finally done) [edited to remove reference to idempotency following comments].
Any opinions, or suggestions for alternative ways to handle long-lived or asynchronous interactions in a RESTful architecture?
Ian