tags:

views:

215

answers:

5

My design exposes two kinds of resources:

  1. Images
  2. Tags

I would like clients to be able to request random images by their tag(s). For example: Give me random images that are tagged with "New York" and "Winter". What would a RESTful design look like in this case?

+1  A: 

Multi-dimensional resource identification is challenging.

Your resource is an image, so that's your URI. Further, a specific image has a specific URI which never changes.

Your "by tag" is a non-identifying attribute of the resource. For this, a query string can belp.

Here's my first thought.

  • http://www.example.com/MyStuff/image/id/ -- specific image by id
  • http://www.example.com/MyStuff/image/?tag=tagname -- random image with a given tag, implicitly, count=1.
  • http://www.example.com/MyStuff/image/?tag=tagname&count=all -- all images with a given tag in a random order (count=1 is the default, which would give you an arbitrary image)
S.Lott
I think it would be even more RESTful if each tag had a URI (as it is a resource, although not a file). What I don't see so clearly is the random order of the tagged set...
Alex Ati
RESTful rarely involves any specification of order (it could, but it's rare). For SQL queries, the default ordering is random. Same here -- default ordering is random.
S.Lott
Well, I don't know in SQL, but in REST, default order for a set may be "not ordered", but not specifically random (sufficiently random, statistically random, if you may). I mean, it need not be random as in "same opportunity to be the first".
Alex Ati
Again, I suggest explicitly defining this particular RESTful resource collection as "random". And, that's compatible with SQL, so there's a precedent.
S.Lott
Oh, OK, sorry. I understood you saying that it was defined in REST, not "we'll define it so for convenience". My fault!
Alex Ati
Point of clarification: I believe that the default sort order for SQL results is "undefined". I highly suspect it is *not* random. For example, query results may be cached so you are going to get the same result (in the same order) time and again.
Gili
S.Lott, what does "multi-dimensional resource identification" mean?
Gili
@Gili: how is "undefined" different from "random"? A statistical subtlety only. It isn't statistically fair, but it certainly isn't in any well-defined order.
S.Lott
Another question just came up. When I add your design to my existing protocol I get:POST /images to create a new imagePOST /images?tag=NewYork to get a random image of New YorkJAX-RS doesn't like two different implementations for the same URI path and I must admit it looks a bit confusing...
Gili
@Gili: "multi-dimensional" -- independent keys are independent dimensions. The term comes from data warehousing. Image has it's own ID (one dimension) and tags (an independent dimension).
S.Lott
@Gili -- don't use POST to GET an image. Use GET. Use PUT to update an image. Use DELETE to delete an image. All four HTTP methods are used. Not simply POST.
S.Lott
@S.Lott: "undefined" makes no guarantee about the order. "random" does make a guarantee. "undefined" could return the same static list of values every time, "random" is guaranteed to never do that.
Gili
@S.Lott -- I am using POST to retrieve random images because the returned values should not be cached. Do you know something I don't?
Gili
@Gili: You should use GET with Cache Control: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9
S.Lott
No need for that with my solution. I have posted another response, because this is getting cumbersome. Please review it!
Alex Ati
REST -- by definition -- uses GET to get things and POST to create new things. If you're using POST to get existing data, know that you're breaking a fundamental rule of REST.
S.Lott
Ok, I meant only the Cache Control; of course you should use GET. My fault!
Alex Ati
+2  A: 

I've struggled myself with this issue. What we ended up implementing was an HttpResponseRedirect from, eg:

http://www.example.com/randomNewYorkImage

to a random New York image:

http://www.example.com/images/New_York/1234.

The first resource can be conceived as a random New York images dispatcher. This solution will load more the server, as it will be requested two resources, but it is as RESTful as you can get.

Edited: Plus, if you are caching, each image will be in the cache, and your server goes from sending an image to sending only the redirect, as the cache will intercept the second request, and thus alleviating your server load.

Alex Ati
Caching a random image is nuts, because you shouldn't expect it to be requested again any time soon. Because it's random.
Triptych
Suppose a web application used by millions of people that displays one image from a set of, say, three. Just for example, the typical We-Love-Working-Here on a typical corporate web. Wouldn't you cache that?
Alex Ati
Suppose a web application with thousands of users requesting millions of random images. Cache then, and your cache breaks.
Triptych
So, in your opinion when your web page is too big you don't cache? The images need not be random through all accesses. Perhaps there are other views to get to a concrete image. Perhaps, as in the example commented, there are tags; if the New York tag is severly visited, you may cache. But the most
Alex Ati
important thing: we are not talking about a particular website. Hence, we need to seek the canonical method. Then in particular scenarios, you may want to drop REST for convenience.
Alex Ati
My opinion is that randomness, by it's very nature, goes against the idea of caching frequently-used resources. There is no way to know that a RANDOM resource will be frequently accessed.
Triptych
That's true; what I am proposing is the caching of the images by it's fixed URI; the random URI acting as a redirection dispatcher that redirects randomly. Thus, the request will be replied by a redirection, and the second URI will be served by the cache.
Alex Ati
Because, you know, the fact that an image is accessible randomly does not mean that it can't be accessed otherwise; in that case, it makes sense to cache it for the non-random method, and give the random method a chance to use the cache instead of using the dynamic server.
Alex Ati
Correct me if I'm wrong, but I believe what you are trying to say is that:http://www.example.com/randomNewYorkImage should use POST to prevent caching, whereas http://www.example.com/images/New_York/1234 should use GET to allow caching.
Gili
Well, getting the image via POST sounds wrong to me, although I'll confess that i don't know how to avoid cache from the client. But surely there's another way...
Alex Ati
A: 

I'd do something like http://foo.com/image/tagged/sometag/random and stop losing sleep over it.

Triptych
Then you can't cache it...
Alex Ati
How the hell can you cache a random resource anyway?
Triptych
Redirecting to a fixed one!
Alex Ati
Then you'll end up caching every image on your web server with repeated requests.
Triptych
Think it backwards: the image may have been cached for another reason: in that case, the cache would use the cache copy. Anyway, it's the duty of the cache, and not yours, to identify what to cache and what not to cache - and at least you should give it the chance.
Alex Ati
A: 

I agree with Triptych on this one. In a way adding random to the end of the URI makes it feel like an operation, but if it is scoped to a tag then you're really just refining the context.

In his example of:

/image/tagged/sometag/random

images resource -> tagging scope (all images with tags) -> specific tag (all images with tag X) -> random (a resource from the scoped list of images with tag X)

sammich
The problem with that solution is that does not link an URI with a fixed image, so, among other thing, you can't cache it.
Alex Ati
Of course you can't cache this URI directly -- you're asking for a random image. If you wanted to consider caching, asking for this resource could vend an HTTP 302 redirect to a cachable URI (like the real authoritative image resource in this case).
sammich
Just what I recommended :P
Alex Ati
@revolution, caching still makes no sense, because you shouldn't expect a randomly-accessed image to be requested again any time soon.
Triptych
Depends on the number of images and the number of requests; being a fixed URI, it can be used by many clients.
Alex Ati
+3  A: 

To sum up all the discussion in the comments, and not to change my initial proposal, this is what I'd come up finally:

You want to access images via tags; each tag relates to a set of images. As a given tag may be used a lot more than another (say, New York photos used a lot more than Chicago's), you should use a RESTful configuration that allows caching, so you can cache New York photos. IMHO, the solution would be:

  • Each image has a fixed URI:

    http://www.example.com/images/12345
    
  • Each tag has also a URI:

    http://www.example.com/tags/New_York/random
    

    This URI acts as a random dispatcher of images on the set; it returns a 303 See Other response, redirecting to a random image of the set. By definition, this URI must not be cached, and the fixed one should, and the browser shouldn't understand that the redirection to the second resource is permanent, so it's optimal.

  • You could even access the whole set via:

    http://www.example.com/tags/New_York
    

    This access would result in a 300 Multiple Choices response; it returns the whole set (as URIs, not as images!) to the browser, and the browser decides what to do with it.

  • You can also use intersection of various tags:

    http://www.example.com/tags/New_York/Autumn/Manhattan/random
    http://www.example.com/tags/Autumn/Manhattan/New_York/random (equivalent to the previous one)
    http://www.example.com/tags/New_York/girls/Summer/random
    etc.
    

So you have a fixed URI for each image, a fixed URI for each tag and its related set of photos, and a fixed URI for a random dispatcher that each tag has. You haven't need to use any GET parameters as other potential solutions, so this is as RESTful as you can get.

Alex Ati
I like your uses of return code 300 and 303 but I see the following problems with your solution:1) I want New_York/random to return multiple random images. I guess you can solve this by returning HTTP 300 with headers that disable caching.
Gili
2) I don't like /tags/New_York/girls/Summer/random because the slash is supposed to represent hierarchical division./tags;name=New_York;name=girls;name=Summer/random seems more correct from a technical perspective.
Gili
Gili, the solution does not have the problem you expose. If you want an URI that returns a different image each time, 303 works, and by its definition the image will not be cached to the /random URI, but to the redirected one. If you want the URI to return various images <i>on a single request</i>,
Alex Ati
then 300 will deliver their URIs.
Alex Ati
What you propose in 2) is not more technically correct; in fact, it is a workaround to be RESTful on a weak way. If you are choosing that path, you might as well use GET parameters, that exist precisely for that purpose :)
Alex Ati
BTW, the problem with tags is that they are not hierarchical by any means; but if you were to implement them hierarchically, for example with folders, files and hardlinks, the approach would be analogous the one I proposed: linking the file to various folders. Thus, I feel my solution withstands.
Alex Ati
@Alejandro, yes my #2 uses GET parameters (matrix parameters to be more precise). I still don't understand why /tags/A/B/C is good practice. Aren't you incorrectly implying that B is nested inside A? Doesn't REST favor using parameters to show that there is no relationships between these tags?
Gili
Is more like having three sets, A, B and C. If you were to find the elements inside the three of them, you'd start by the set A, then picking from it the elements that are in B, etc. Ie, you are not saying B is in A, but that the set A^B^C is inside A, inside B, and inside C, which is not incorrect.
Alex Ati
@Gili, I don't think that URIs has to express any hierarchical relationships per se.Yes, it's a nice practice if your users can get any benefits of it,but it's not necessary. Alejandro's solution seems pretty sound to me.(Please, point me to some reference, if I'm wrong about the hierarchy thing.)
Milan Novota
This is RPC, not REST.
Wahnfrieden