tags:

views:

117

answers:

4

This is question more about service architecture strategy, we are building big web system based on rest services on back end. And we are currently trying to build some internal standards to follow while developing rest services.

Some queries returns list of entities, for example lets consider we have image galleries retrieving service: /gell_all_galeries, returning next response:

<galleries>
   <gallery>
      <id>some_gallery_id</id>
      <name>my photos</name>
      <photos>
          <photo>
                <id>123</id>
                <name>my photo</name>
                <location>http://mysite/photo/show/123&lt;/location&gt;
                ......
                <author>
                     <id>some_id</id>
                     <name>some name</name>
                     .......
                <author>
          </photo>
          <photo> ..... </photo>
          <photo> ..... </photo>
          <photo> ..... </photo>
          <photo> ..... </photo>
    </photos>
  </gallery>
  <gallery> .... </gallery>
  <gallery> .... </gallery>
  <gallery> .... </gallery>
  <gallery> .... </gallery>
</galleries>

As you see here, response quite big and heavy, and not always we need such deep info level. Usual solution is to use or http://ru.wikipedia.org/wiki/Atom elements for each gallery instead of full gallery data:

<galleries>
   <gallery>
      <id>some_gallery_id</id>
      <link href="http://mysite/gallery/some_gallery_id"/&gt;
   </gallery>
   <gallery>
      <id>second_gallery_id</id>
      <link href="http://mysite/gallery/second_gallery_id"/&gt;
   </gallery>
  <gallery> .... </gallery>
  <gallery> .... </gallery>
  <gallery> .... </gallery>
  <gallery> .... </gallery>
</galleries>

The first question, is next: maybe instead we shouldn't even use and types, and just use generic and for all resources that return list objects:

 <list>
  <item><link href="http://mysite/gallery/some_gallery_id"/&gt;&lt;/item&gt;
  <item><link href="http://mysite/gallery/other_gallery_id"/&gt;&lt;/item&gt;
  <item>....</item>
</list>

And the second question, after user try to retrieve info about some concrete gallery, he'll use for example http://mysite/gallery/some_gallery_id link, what should he see as results?

Should it be:

   <gallery>
      <id>some_gallery_id</id>
      <name>my photos</name>
      <photos>
          <photo>
                <id>123</id>
                <name>my photo</name>
                <location>http://mysite/photo/show/123&lt;/location&gt;
                ......
                <author>
                     <id>some_id</id>
                     <name>some name</name>
                     .......
                <author>
          </photo>
          <photo> ..... </photo>
          <photo> ..... </photo>
          <photo> ..... </photo>
          <photo> ..... </photo>
    </photos>
  </gallery>

or :

   <gallery>
      <id>some_gallery_id</id>
      <name>my photos</name>
      <photos>
          <photo><link href="http://mysite/photo/11111"/&gt;&lt;/photo&gt;
          <photo><link href="http://mysite/photo/22222"/&gt;&lt;/photo&gt;
          <photo><link href="http://mysite/photo/33333"/&gt; </photo>
          <photo> ..... </photo>
    </photos>
  </gallery>

or

   <gallery>
      <id>some_gallery_id</id>
      <name>my photos</name>
      <photos>
          <photo>
               <link href="http://mysite/photo/11111"/&gt;
               <author>
                    <link href="http://mysite/author/11111"/&gt;
               </author>
           </photo>
          <photo>
               <link href="http://mysite/photo/22222"/&gt;
               <author>
                    <link href="http://mysite/author/11111"/&gt;
               </author>
           </photo>
          <photo>
               <link href="http://mysite/photo/33333"/&gt;
               <author>
                    <link href="http://mysite/author/11111"/&gt;
               </author>
           </photo>
          <photo> ..... </photo>
    </photos>
  </gallery>

I mean if we use link instead of full object info, how deep we should go there? Should I show an author inside photo and so on.

Probably my question ambiguous, but what I'm trying to do is create general strategy in such cases for all team members to follow in future.

A: 

You can always use attributes.

  <gallery id = "1" name = "Gallery 1">
      <photos>
          <photo id="1" link="http://mysite/photo/11111" />
          <photo id="2" link="http://mysite/photo/22222" />
          <photo id="3" link="http://mysite/photo/33333" />
      </photos>
  </gallery>

Or you can use JSON I prefer it since its easier and lighter than XML.

{
    "gallery": {
        "id": "1",
        "name": "Gallery 1",
        "photos": [
            {
                "id": "1",
                "link": "http://mysite/photo/11111" 
            },
            {
                "photo": "2",
                "link": "http://mysite/photo/22222" 
            },
            {
                "photo": "3",
                "link": "http://mysite/photo/33333" 
            } 
        ] 
    } 
JeremySpouken
I'm using Java and Apache CXF to serve up a RESTful service. That has the advantage of being able to serve up *both* XML and JSON for the same resource, depending on what the client says it prefers (i.e., via content negotiation).
Donal Fellows
+2  A: 

A good thing to consider is how you intend for clients to retrieve the data. If you're intending for a client to grab a whole bunch of information about many photos, then a list of only <photo href="..."/> might not be optimal, since the client would then be forced to perform a GET request for each photo resource they need information about.

I can think of a couple interesting ways around this off the top of my head.

You could allow a client to specify the fields they'd like to retrieve as query parameters when querying the list, e.g.:

GET http://www.example.com/photos?_fields=author,fileSize

This could then return something like:

<photos href="/photos?_fields=author,fileSize">
    <photo href="/photos/15">
        <author href="/authors/2245"/>
        <fileSize>32MB</fileSize>
    </photo>
    ...
</photos>

Alternatively, you could make it simpler by allowing the client to specify some sort of maximum "depth" property; this is a bit more crude, but could be used effectively. For example, if the client specified a depth of 2, you'd return everything under <gallery>, as well as all child elements of each <photo>.

GET /galleries?depth=2

Might return something like:

<galleries>
  <id>22</id>
  <name>My Gallery</name>
  <!-- full gallery data -->
  <photos href="/photos?gallery=/galleries/22">
    <photo href="/photos/99">
      <id>99</id>
      <author href="/authors/4381"/><!-- href instead of including nested author data -->
      <fileSize>24MB</fileSize>
      <!-- full photo data -->
    </photo>
    ...
  </photos>
</galleries>

Alongside of this, if you're concerned about the client querying many, many records at once (e.g. if there are thousands of photos or galleries), you might want to consider some sort of paging for your lists. This might involve setting a hard maximum for results in your code and providing the client with links to next/previous pages:

GET /photos?gallery=/galleries/59

Might return:

<photos href="/photos?gallery=/galleries/59&_max=100&_first=100" next="/photos?gallery=/galleries/59&_max=100&_first=200" prev="/photos?gallery=/galleries/59&_max=100&_first=0" count="100" total="3528">
    ....
</photos>

Clients could control the _first and _max properties, but could never increase the _max over a certain configured threshold. You would return the number of "found" results for the page in the markup as well as the total number of results available. This would help you cut back on the response sizes, which you mentioned might be a concern. This could be done in parallel with the options listed above.

Ultimately it's up to how you want your server to instruct the clients to retrieve data. If you don't want them doing a GET for each photo then you might want to provide them more convenient ways to get deeper data. But if you think your server can handle decent load, and along with that you can make server-side optimizations (caching, using 304 statuses, etc.), then just returning shallow lists with hrefs is a bit more straightforward.

Rob Hruska
I know the idea of letting the client choose what data elements they wish to receive is a popular one, however you have to consider how it can seriously limit your ability to cache responses. If you just have five properties then imagine the number of different variations of responses that you could have. Do you cache them all and bloat your cache, cache none of them and put additional load on your database server, or do you cache in an intelligent intermediary that can pull out subsets of data from a complete cached copy?
Darrel Miller
@Darrel - A good point. It probably depends on whether or not the server needs to be able to cache results. If not, then it's viable; but if so, I'd probably avoid it.
Rob Hruska
@Rob I think your idea to specify fields names need to be returned make sense, as I remember Linked In Api works this way. Thanks.
abovesun
@abovesun - It's definitely an option, but I would heed @Darrel's comment and his answer as well. It sounds like you're (to some degree) concerned about performance, which would imply that your API will need to do some sort of caching; using the dynamic fields would make caching that much tougher.
Rob Hruska
@Rob Depends if perf is an issue. Caching is just a perf optimization tool. The strange thing is the "select just the fields you want" is also considered a perf optimization technique. But HTTP 1.1 wasn't built with this approach in mind. It was built with caching in mind. For me, it is just going against the grain.
Darrel Miller
@Darrel, @Rob Depends - caching actually isn't first priority here. We are using Oracle Coherence on the back end and there is no big difference from performance side between selecting all fields and set of fields. But we'll have very big number of different rest resources (services) and we need some common rules.We rather thinking about something more useful and elegant from service client perspective, the point is that on a top of core rest api several web sites will be created by several different development teams. So we need something convenient and intuitive.
abovesun
@abovesun Ahhh, so your REST api is not actually there to serve a real human driven user-agent, it is there to provide a service to other REST services (aka web sites). That I can't help you with. I'm still not really convinced that is a good scenario for REST.
Darrel Miller
+2  A: 

There really is no right or wrong answer to "how should I design my media types". However, there are a few very important guidelines when selecting existing and designing new media types.

RESTful systems achieve scalability through careful use of caching. Designing your resources to break down content into chunks that have similar data volatility. For example, with your scenario, you have a list of galleries that contain photos. My guess would be that you don't add/remove galleries very often, but you do add/remove photos regularly. Therefore it would make sense to ensure that you could get a list of galleries that has no photo information. That would make it easy to cache that response.

Optimizing the size of responses can be important for performance, but caching is way more important. Sending 0 bytes across the wire is always more efficient.

Even though the list of photos may change more regularly you can still using caching effectively. By using the if-modified-since header or etags, you will not save the network roundtrip, but you can save lots of bandwidth by not transferring representations that are unchanged.

It is extremely difficult to design resources that are ideal for all circumstances and because of this I suggest you do not try. Design resources that work well for your particular use cases. If other use cases arise create new resources to handle those.

There is nothing wrong with creating:

/gallery/foo/quickview
/gallery/foo/detailedview
/gallery/foo/justlinks

You want to use a web framework that makes it really easy and cheap to create new resources. Resources will rarely have a one-to-one mapping with your domain entities, so feel free to create as many resources as you need.

My last comment is regarding the selection of a media-type. You should really consider using something like Atom for a service like this. Atom is ideal for managing lists of things and it has all the mechanisms in place to handle media elements like photos.

Most people when they start using REST services get used to the idea that they can deliver straight application/xml or application/json as a media type. There are some specialized cases where this is completely feasible, however as you start to implement more of the REST constraints you will find these generic media type formats will limit the benefits you can achieve in many cases. For the moment, don't worry too much about it just be aware that it is always safer to pick a "real" media type like application/xhtml, RDF or Atom, and if you do choose application/xml you may run into difficulties later on.

Darrel Miller
Thanks Darrel, from Atom schema we just took Link type definition, and we are not using Entry or Feed types. If you interested we are using Java Jersey rest implementation. Probably I would agree of your statement that there are no common design rules here and final design decision depends on concrete use case.
abovesun
A: 

This really depends on your scenario. You need to know how the client is going to use this to know how to design your Resource Proxies.

I suggest you don't get lost in the "choice crossroad". Just go with one implementation based on what you assume about the client usage . See how the whole thing is used and behaves, fine tune afterwards if needed. Wash. Rinse. Repeat. Do it the permanent Beta way :)

redben