views:

230

answers:

1

RESTful design seems to advocate flat or shallow structured representations (at least when resources are represented as XML). A resource representation should contain just the resource, which the URI identifies. I'm wondering when it is sensible to present resource's sub-resources within the parent resource?

To elaborate, consider this: Company may have multiple employees. Usually this situation would probably be designed as two separate resources, company and employee, where employee would be company's sub-resource.

/company/acme/
/company/acme/employees/
/company/acme/employee/john

With this URI design, company representation should include links to its employees, but the XML representation probably would not include emplyoyees per se.

Therefore, when does it make sense present sub-items through the parent? And is there a situation where it would be sensible to present sub-item only through their parent. I mean that there would be no URI for sub-items at all. They could be reached only through the parent resource.

<company>
<name>Acme</name>
 <employees>
  <employee>John</employee>
  <employee>Jack</employee>
 </employees>
</company>

Is it sensible to offer only one method to access a resource: if a parent exposes its sub-items, can there be an explicit URI for the sub-items too? So, if the company's XML contains company's employees, would it make sense to offer /company/acme/employees URI dispite that you can get the same information through the company resource?

+5  A: 

If the sub-resource only makes sense in the context of its parent, then yes, it should be returned nested within its parent. For instance, in HTML, an <li> element doesn't make sense as a sub resource on its own.

However, if a resource can stand alone, and you are going to want to manipulate a resource independent of any other resources, then it should have its own URI. That way, you can POST or PUT to that resource without affecting other related resources, and without having to parrot them back to the server. If you had to manipulate everything from the parent, think about what happens if one person does a GET, modifies one sub-item, and then does a PUT of the whole thing with that sub-item changed; what if someone else changed one of the others in the meantime? Then you need to be adding locks and transactional semantics, which defeats the whole statelessness of REST.

For GET requests at least, it is likely to be a good idea to have some form of bulk query interface, by which a client can get a large number of resources at once; having to do a new HTTP request for each resource can take a long time, since it means a new round trip over the network for each GET. It may make sense to have bulk update functionality as well. But if you are going to want to be able to manipulate one resource at a time, you need to provide a URI for that one resource.

And yes, it's perfectly fine to have more than one way to access resource. You can think of it like a blog; you can get stories on the main page, or on archive pages, or by going to their permalink.

edit: If you want to to a bulk update without running into the problem of having one client give stale data to the server, you basically have two options:

  1. Locking. One client tells the server "I want a lock on this whole set of data", fetches the data it wants to modify, modifies the data, sends it back to the server, and unlocks it.
  2. Optimistic concurrency: The client downloads the set of data, which is marked with some sort of revision tag which the server changes every time it gets new data. The client modifies it, and sends it back to the server. If any of the other data in the set has been modified in the meantime, the revision tag will be out of data, and the server will respond with a "sorry, your data is out of date, try again."

These each have advantages and pitfalls. The problem with locking is that it is stateful, and so doesn't fit into a REST architecture very well. If the client program crashes or otherwise dies while it has the lock, then that data will be permanently locked unless you have some kind of lock timeout, which can get tricky. Locking can also lead to deadlock if the clients are doing some kind of fancy transactions that involve multiple locks.

The problem with optimistic concurrency is that if there is a high load on a data set, with a lot of clients changing it at once, it can take many, many tries before a given client can post its data; in fact, a slow client may end up being completely cut off from posting updates because other clients continually change the data in ways that mean that the slow clients changes always fail.

You'll need to decide for yourself which of these options suit you. These issues also come up when changing a single resource (one update may clobber another), but when you aggregate resources into a bulk interface, they are going to come up a lot more often. That's why I would recommend having two interfaces, if you're going to aggregate resources; one in which the resources can be accessed individually, and an optional bulk interface where many resources can be read and written at once.

Brian Campbell
Thank you for the answer. About the bulk updates, how can you implement such feature without compromising the integrity of the data as you state in the second paragraph of your answer?
massive