tags:

views:

1090

answers:

5

In my attempt to redesign an existing application using REST architectural style, I came across a problem which I would like to term as "Mediatype Explosion". However, I am not sure if this is really a problem or an inherent benefit of REST. To explain what I mean, take the following example

One tiny part of our application looks like:

collection-of-collections->collections-of-items->items

i.e the top level is a collection of collections and each of these collection is again a collection of items.

Also, each item has 8 attributes which can be read and written individually. Trying to expose the above hierarchy as RESTful resources leaves me with the following media types:

application/vnd.mycompany.collection-of-collections+xml
application/vnd.mycompany.collection-of-items+xml
application/vnd.mycompany.item+xml

Further more, since each item has 8 attributes which can be read and written to individually, it will result in another 8 media types. e.g. one such media type for "value" attribute of an item would be:

application/vnd.mycompany.item_value+xml

As I mentioned earlier, this is just a tiny part of our application and I expect several different collections and items that needs to be exposed in this way.

My questions are:

  1. Am I doing something wrong by having these huge number of media types?
  2. What is the alternative design method to avoid this explosion of media types?

I am also aware that the design above is highly granular, especially exposing individual attributes of the item and having separate media types for each them. However, making it coarse means I will end up transferring unnecessary data over the wire when in reality the client only needs to read or write a single attribute of an item. How would you approach such a design issue?

A: 

You're using the media type to convey details of your data that should be stored in the representation itself. So you could have just one media type, say "application/xml", and then your XML representations would look like:

<collection-of-collections>
    <collection-of-items>
        <item>
        </item>
        <item>
        </item>
    </collection-of-items>
    <collection-of-items>
        <item>
        </item>
        <item>
        </item>
    </collection-of-items>
</collection-of-collections>

If you're concerned about sending too much data, substitute JSON for XML. Another way to save on bytes written and read is to use gzip encoding, which cuts things down about 60-70%. Unless you have ultra-high performance needs, one of these approaches ought to work well for you. (For better performance, you could use very terse hand-crafted strings, or even drop down to a custom binary TCP/IP protocol.)

Edit One of your concerns is that:

making [the representation] coarse means I will end up transferring unnecessary data over the wire when in reality the client only needs to read or write a single attribute of an item

In any web service there is quite a lot of overhead in sending messages (each HTTP request might cost several hundred bytes for the start line and request headers and ditto for each HTTP response as in this example). So in general you want to have less granular representations. So you would write your client to ask for these bigger representations and then cache them in some convenient in-memory data structure where your program could read data from them many times (but be sure to honor the HTTP expiration date your server sets). When writing data to the server, you would normally combine a set of changes to your in-memory data structure, and then send the updates as a single HTTP PUT request to the server.

You should grab a copy of Richardson and Ruby's RESTful Web Services, which is a truly excellent book on how to design REST web services and explains things much more clearly than I could. If you're working in Java I highly recommend the RESTlet framework, which very faithfully models the REST concepts. Roy Fielding's USC dissertation defining the REST principles may also be helpful.

Jim Ferrans
Sorry Jim, this is absolutely incorrect. The media type is exactly what should be used to "convey the details of your data". The media type is part of the contract with the client.
Darrel Miller
Sorry Darrel, you need to read RFC 2046 (http://tools.ietf.org/html/rfc2046) and look at the registered media types over at the IANA registry (www.iana.org). A media type is analogous to a file format, not the file contents. Of course the "vnd" namespace is open to any (mis)use. In a REST design, the representations of a resource each have an associated media type. In the OP's approach, only one representation is possible."An Internet media type ... is a two-part identifier for file formats on the Internet."
Jim Ferrans
Roy Fielding recently stated "A REST API should spend almost all of its descriptive effort in defining the media type(s) used for representing resources..." The choice between a standard format and a custom one is a difficult choice, but application/xml is pretty much the worst choice you could make. Pretty much the only way to make it work is in a browser with code-download that can interpret the xml format. But that choice makes re-use very difficult.
Darrel Miller
Okay, I see where you're coming from. (Your point that the media type conveys "the details of your data" was misleading, as it names the external standard containing the details.) Dare Obasanjo on 10/24/2008 elaborated: "[S]ticking to defining data payloads which are then made standard MIME types gives maximum reusability of the technology" and uses application/xml+rss as an example. I certainly agree. BTW, an alternate "low REST" approach sees "Accept: application/xml" and infers the "+whatever" from the resource name. Not high REST, of course.
Jim Ferrans
You are right, my statement wasn't clear. For apps on the public web, standard MIME types like atom, rss, xhtml are definitely the way to go. I must admit to being a bit of a bigot when it comes to the hi-rest, lo-rest debate. I'm in the lo-rest isn't REST crowd. Hypermedia is a magical thing when you use it right.
Darrel Miller
REST is not designed for network efficiency. Granularity is at the resource level, not at the media type definition level.From my experience, too many small resources to cover partial updates can become quite a nightmare, both in designing your media types and ensuring the recovery scenarios of HTTP still work.
serialseb
@Darrel: Yes, higher REST seems to always work out better, modulo current gotcha's like lack of support in HTML and XHTML (though you can tunnel PUT and DELETE and so on through POST).
Jim Ferrans
@serialseb REST doesn't require granularity at the resource level - and resources aren't even directly exposed via REST, just representations of them, which may or may not be particularly granular. @Jim "higher REST" is a misnomer - there's simply REST, or not REST. Lower/higher suggests a flexibility that does not exist in REST, usually propagated by people who think it makes their RPC API look better to call it "REST."
Wahnfrieden
@Wahnfrieden: There are degrees of RESTfulness, and Richardson and Ruby's RESTful Web Services uses the term "REST/RPC hybrids" to describe services that aren't pure REST and aren't pure RPC (and then go on to show how some of these can be made truly RESTful). But I do think you're right to be a hard-liner since the term does seem to be misused frequently.
Jim Ferrans
@Wahnfrieden: Sam Ruby observes: "Without intending to take anything away from Roy’s (valid) criticism [see @Darrel's link] on labeling, REST isn’t an all or nothing proposition. One can get significant value from partial adoption." See http://www.intertwingly.net/blog/2008/10/21/Progressive-Disclosure
Jim Ferrans
+8  A: 

One approach that would reduce the number of media types required is to use a media type defined to hold lists of other media-types. This could be used for all of your collections. Generally lists tend to have a consistent set of behavior. You could roll your own vnd.mycompany.resourcelist or you could reuse something like an Atom collection.

With regards to the specific resource representations like vnd.mycompany.item, what you can do depends a whole lot on the characteristics of your client. Is it in a browser? can you do code-download? Is your client a rich UI, or is it a data processing client?

If the client is going to do specific data processing then you pretty much need to stick with the precise media types and you may end up with a large number of them. But look on the bright side, you will have less media-types than you would have namespaces if you were using SOAP!

Remember, the media-type is your contract, if your application needs to define lots of contracts with the client, then so be it.

However, I would not go as far as defining contracts to exchange single attribute values. If you feel the need to do that, then you are doing something else wrong in your design. Distributed interface design needs to have chunky conversations, not chatty ones.

Darrel Miller
Darrel, Thanks for your valuable inputs. Currently I do see a need to define contracts to exchange single attribute values in our application. But you are right, maybe I need to take a different approach here...
Suresh Kumar
@Darrel: a very good answer +1
jkp
A: 

Unless you intend on registering these media types you should pick one of the existing mime types instead of trying to make up your own formats. As Jim mentions application/xml or text/xml or application/json works for most of what gets transmitted in a REST design.

In reply to Darrel here is Roy's full post. Aren't you trying to define typed resources by creating your own mime types?

Suresh, why isn't HTTP+POX Restful?

Paul Morgan
Media types need to be standardized based on their target audience. If you are trying to reach the world, then you had better use an existing registered media type. If your application does not reach outside of your enterprise firewall then a custom media type standardized within your enterprise is much better than application/xml. In the words of Roy Fielding "A REST API should never have “typed” resources that are significant to the client. The only types that are significant to a client are the current representation’s media type...". Application/xml is not very helpful.
Darrel Miller
Sorry Paul, I don't consider HTTP+POX RESTful.
Suresh Kumar
HTTP+POX may be restful if you don't breach the rest constraints.If you describe to devs a bunch of URIs, spit objects serialized as text/xml, you're starting to breach a lot of the constraints, as well as disregarding best practices. If you add RPC behavior to the mix, you can't get any further than REST, short of adoping CORBA.
serialseb
Paul asked "Aren't you trying to define typed resources by creating your own mime types?"Yes. That's why Roy said "The only types that are significant to a client are the current representation’s media type". Sending application/xml and then letting the client reach into that XML and deserialize it into a concrete type on the client introduces hidden coupling. The "Self-descriptive" REST constraint in effect says that the client cannot know more about the representation than the media type tells it. Application/xml just says you have elements and attributes!
Darrel Miller
Using HTTP and XML *can* be RESTful, but it's no guarantee. I just looked into a new "RESTful" API this afternoon and all requests are made via HTTP to the same endpoint URI, with the operation and all parameters hidden away in an XML format in the request body. It was written in HTTP and XML, but completely Remote Procedure Call and extremely non-RESTful.
Jim Ferrans
Paul, as Darrel correctly mentions above, HTTP+POX breaks the "self descriptive" constraint of REST. If the representation is not self descriptive, we will have to rely on out-of-band information.
Suresh Kumar
But if the XML included a link to the XML Schema/DTD wouldn't it be self descriptive? I'd focus more on creating an XML Schema/DTD coupled with POX instead of media/mime types.
Paul Morgan
Mark Baker has a number of posts explaining why XML Schema does not replace media types http://www.markbaker.ca/blog/2004/09/why-namespaces-dont-replace-media-types/
Darrel Miller
The other problem is that a client has to parse the entity body before it can decide whether it can process the message.
Darrel Miller
+1  A: 

A media type should be seldomly created and time should be invested in making sure the format can survive change.

As you're relying on xml, there is no particular reason why you couldn't create one media type, provided that media type is described in one source.

Choosing ATOM over having one host media type that supports multiple root elements doesn't necessarily bring you anything: you'll still need to start reading the message within the context of a specific operation before deciding if enough information is present to process the request.

So i would suggest that you could happily have one media type, represented by one root element, and use a schema language to specify which of the elements can be contained.

In other words, a language like xsd can let you type your media type to support one of multiple root elements. There is nothing inherently wrong with application/vnd.acme.humanresources+xml describing an xml document that can take either or as a root element.

So to answer your question, create as few media types as you can possibly afford, by questioning if what you put in the documentation of the media type will be understandable and implementeable by a developer.

serialseb
The only advantage of using an ATOM feed as the media type for yours your collection is that you could use any old RSS reader to see newly added items to a collection. Maybe it is not a useful requirement but it could be. The feed reader won't be able to do much with the actual ATOM entry, but the title could be used as a short description.
Darrel Miller
You're quite right, but it implies importing another media type in your solution. I'm generally wary of reusing ATOM too much, I can see a lot of pain coming from abuse of encapsulation formats...
serialseb
+3  A: 

I think I finally got the clarification I sought for the above question from Ian Robinson's presentation and thought I should share it here.

Recently, I came across the statement "media type for helping tune the hypermedia engine, schema for structure" in a blog entry by Jim Webber. I then found this presentation by Ian Robinson of Thoughtworks. This presentation is one of the best that I have come across that provides a very clear understanding of the roles and responsibilities of media types and schema languages (the entire presentation is a treat and I highly recommend for all). Especially lookout for the slides titled "You've Chosen application/xml, you b*st*rd." and "Custom media types". Ian clearly explains the different roles of the schemas and the media types. In short, this is my take away from Ian's presentation:

A media type description includes the processing model that identifies hypermedia controls and defines what methods are applicable for the resources of that type. Identifying hypermedia controls means "How do we identify links?" in XHTML, links are identified based on tag and RDF has different semantics for the same. The next thing that media types help identify is what methods are applicable for resources of a given media type? A good example is ATOM (application/atom+xml) specification which gives a very rich description of hyper media controls; they tell us how the link element is defined? and what we can expect to be able to do when we dereference a URI so it actually tells something about the methods we can expect to be able to apply to the resource. The structural information of a resource represenation is NOT part of or NOT contained within the media type description but is provided as part of appropriate schema of the actual representation i.e the media type specification won’t necessarily dictate anything about the structure of the representation.

So what does this mean to us? simply that we dont need a separate media type for describing each resource as described above in my original question. We just need one media type for the entire application. This could be a totally new custom media type or a custom media type which reuses existing standard media types or better still, simply a standard media type that can be reused without change in our application.

Hope this helps.

Suresh Kumar