views:

109

answers:

3

Coming from a lot of frustrating times with WSDL/Soap, I very much like the REST paradigm, but am trying to solve two basic problems in our application, before moving over to REST. The first problem relates to the lack of an interface document. I think I finally see how to handle this situation: One can query his way down from a top-level "/resources" resource using various requests of GET, HEAD, and OPTIONS to find the one needed resource in the correct hypermedia format. Is this the idea? If so, the client need only be provided with a top-level resource URI: http://www.mywebservicesite.com/mywebservice/resources. He will then have to do some searching and possible keep track of what he is discovering, so that he can use the URIs again efficiently in future to do GETs, POSTs, PUTs, and DELETEs. Are there any thoughts on what should happen here?

The other problem is that we cannot use descriptive URLs like /resources/../customer/Madonna/phonenumber. We do have an implementation of opaque URLs we use in the context of a session, and I'm wondering how opaque URLs might be applied to REST. The general problem is how to keep domain-specific details out of URLs, and still benefit from what REST has to offer.

A: 

I haven't really used web RPC systems (WSDL/Soap), but i think the 'interface document' is there mostly to allow client libraries to create the service API, right? if so, REST shouldn't need it, because the verbs are already defined and don't really need to be documented again.

AFAIUI, the REST way is to document the structure of each resource (usually encoded in XML or JSON). In that document, you'll also have to document the relationship between those resources. In my case, a resource is often a container of other resources (sometimes more than one type), therefore the structure doc specifies what field holds a list of URLs pointing to the contained resources. Ideally, only one unique resource will need a single, fixed (documented) URL. everithing else follows from there.

The URL 'style' is meaningless to the client, since it shouldn't 'construct' an URL. Every URL it needs should be already constructed on a resource field. That let's you change the URL structure without changing the client (that has saved tons of time to me). Your URLs can be as opaque or as descriptive as you like. (personally, i don't like text keys or slugs; my keys are all BIGINTs or UUIDs)

Javier
I understand the concepts mentioned above, except your last parenthetical comment. How exactly do you use keys and slugs in this context?The main point of my original question is that, in REST, one does not have and cannot have a flat API or flat interface in general to the service. One must drill, and retain state in the client, if one wants to drill again efficiently to the save resources. We have a graph of pages each containing embedded links to (names of) other resources.
Shaping
you put an example of a 'descriptive' URL: `/resources/../customer/Madonna/phonenumber` where the customer is keyed by a textstring ('Madonna'). some frameworks have all the facilities in place to make such text keys easy to use; but in REST the aesthetics of the URL is immaterial because they're for machine consumption only. so i simply use internal numeric IDs to build the URLs
Javier
Using a numeric ID is how you would make the URL opaque, but REST says that you are to create a self-describing URI-space for all resources first. The machine doesn't care either way--descriptive or opaque--but with a set of opaque URLs how then does a client use the service when he can't read and understand the names of the resources and their semantic value? Also, there still remains the issue of having to explain to your client that he must also write a crawler to look through the resource graph to find specific resources. Can anyone answer this question?
Shaping
in any case your client has to write the code that handles the resources, that's why you document the resource structure. part of that structure is what other resources it links to. on the URL style; it's really meaningless, if you like they can be very readable. the client shouldn't care.
Javier
No client-side automation is possible with opaque URLs. I'm not sure the question and use-case I present are understood. Requiring that the client program is not reasonable in many cases, and greatly limits business value when a service exists in one enterprise and must readily by used by another enterprise, without explanatory documentation, e-mails, or phone calls. Client software should do most of the hard work automatically via a standard all participants can agree on. REST does not seem to offer this ability.
Shaping
AFAIK, there's no standardized way to tell the client which fields on your structure are urls to follow; but if you use a regular field naming (i.e all 'href' fields are URLs); then the client could spider the whole struct automatically. in any case, i don't see what could such a generic non-programmable client do.
Javier
A: 

I am currently building a REST "agent" that addresses the first part of your question. The agent offers a temporary bookmarking service. The client code that is interacting with the agent can request that an URL be bookmarked using some identifier. If the client code needs to retrieve that representation again, it simply asks the agent for the url that corresponds to the saved bookmark and then navigates to that bookmark. Currently those bookmarks are not persisted so they only last for the lifetime of the client application, but I have found it a useful mechanism for accessing commonly used resources. E.g. The root representation provides a login link. I bookmark that link and if the client ever receives a 401 then I can redirect to the "login" bookmark.

To address an issue you mentioned in a comment, the agent also has the ability to store retrieved representations in a dictionary. If it becomes necessary to aggregate and manipulate multiple representations at the same time then I can simply request that the agent store the current representation in a dictionary associated to a key and then continue navigating to the next resource. Once the client has accumulated all the necessary representation it can do what it needs to do.

Darrel Miller
The bookmarking done by your agent is a local private cache for just this client on this machine, right? The cache contains a map between a string and the actual URI you'll need to re-get later. How do build the strings that become the keys in this map? Is there a general scheme? Probably not. What is the standard algorithm for using a REST service? How about: "Start with this root URI and crawl around the resource graph, using GET, OPTIONS, and HEAD, until you've located all needed resource representations, and remember to cache the ones you think you will need to get again soon."
Shaping
Is the REST community trying to standardize such an algorithm for using a REST service? Not having such a standard and not having a way to use opaque URLs (and still be able to walk the resource graph) prevent me from using REST. So I use WSDL/Soap. I sense from the reading I've done that most REST proponents don't value this ability of a machine to use a REST service automatically and quickly, as can be done with WSDL/Soap.
Shaping
The resource aggregation idea above is a good one, but I think someone should publish it as "the best way" or one of "these five best ways" to get your resource representations out of a REST-based resource graph. If these standards exist in rigorous form somewhere, please point me to them. We need more structure and guidance in this REST craft than HTTP verbs and response codes.
Shaping
+1  A: 

The other problem is that we cannot use descriptive URLs like /resources/../customer/Madonna/phonenumber.

I think you've misunderstood the point of opaque URIs. The notion of opaque URIs is with respect to clients: A client shall not decipher a URI to guess anything of semantic meaning from it. So a service may well have URIs like /resources/.../customer/Madonna/phonenumber, and that's quite a good idea. The URIs should be treated as opaque by clients: not infer from the URI that it represents Madonna's phone number, and that Madonna is a customer of some sort. That knowledge can only be obtained by looking inside the URI itself, or perhaps by remembering where the URI was discovered.

Edit:

A consequence of this is that navigation should happen by links, not by deconstructing the URI. So if you see /resouces/customer/Madonna/phonenumber (and it actually represents Customer Madonna's phone number) you should have links in that resource to point to the Madonna resource: e.g.

{
  "phone_number" : "01-234-56", 
  "customer_URI": "/resources/customer/Madonna" 
}

That's the only way to navigate from a phone number resource to a customer resource. An important aspect is that the server implementation might or might not have domain specific information in the URI, The Madonna record might just as well live somewhere else: /resources/customers/byid/81496237. This is why clients should treat URIs as opaque.

Edit 2:

Another question you have (in the comments) is then how a client, with the required no knowledge of the server's URIs is supposed to be able to find anything. Clients have the following possibilities to find resources:

  1. Provide a search interface. This could be done by providing an OpenSearch description document, which tells clients how to search for items. An OpenSearch template can include several variables, and several endpoints, depending on what you're looking for. So if you have a "customer ID" that's unique, you could have the following template: /customers/byid/{proprietary:customerid}", the customerid element needs to be documented somewhere, inside the proprietary namespace. A client can then know how to use such a template.

  2. Provide a custom form. This implies making a custom media type in which you explicitly define how (based on an instance of the document) a URI to a customer can be forged. <customers template="/customers/byid/{id}"/>. The documentation (for the media type) would have to state that the template attribute must be interpreted as a relative URI after the string substitution "{id}" to an actual customer ID.

  3. Provide links to all resources. Some resources aren't innumerable, so you can simply make a link to each and every one of them, optionally including identifying information along with the links. This could also be done in a custom media type: <customer id="12345" href="/customer/byid/12345"/>.

It should be noted that #1 and #2 are two ways of saying the same thing: Clients are allowed to create URIs if they

  1. haven't got the URI structure a priori
  2. a media type exists for which the documentation states that URIs should be created

This is much the same way as a web browser has no idea of any URI structure on the web, except for the rules laid out in the definition of HTML forms, to add a ? and then all the query parameters separated by &.

In theory, if you have a customer with id 12345, then you could actually dispense with the href, since you could plug the customer id 12345 into #1 or #2. It's more common to actually provide real links between resources, rather than always relying on lookup or search techniques.

mogsie
Yes, the notion of an opaque URI is with respect to the client. I thought that was clear to everyone from the beginning. However, the client app still cannot navigate the resource graph if the URIs are opaque: one cannot build a semantic crawler when building the path segments of an opaque URI. This is one of the problems I'm trying to solve before committing to using REST.
Shaping
I added a few paragraphs to address the navigation issue. Please comment if I still haven't understood your problem :-)
mogsie
Thanks for working into the details. But, no, either I'm missing something basic, or the essential problem is not perceived, perhaps not even valued. First, I'm thinking (perhaps incorrectly), based on hi-REST principles that the resource representation for Madonna's phone number is /resources/customer/Madonna/phonenumber, which will GET me, the client, a chunk of Jason that looks like{ "phone_number" : "01-234-56" } Must this structure have the redundant Madonna id? I already knew that I was going after Madonna's number from the resource name's last path segment.
Shaping
The bigger problem is that a program client or human client cannot reason about how to navigate a resource graph to a phone number, if the terminal concept is random number "81496237" instead of "phonenumber". Yes, we need to maintain a mapping somewhere between clear and opaque versions of the resource names, but the client must ultimately be given the ability to navigate to any resource in the graph. How can he do that with a bunch of random numbers in his resource names. He is not allowed to see any of the clear URLs, not even the very general "customers" resource.
Shaping
The last sentence in the first of the two comments gives it away: *I already knew that I was going after Madonna's number from the resource name's last path segment.* You really don't know that!
mogsie
So-called hi-REST principles mean that the client **discovers** Madonna's phone number resource form another resource, probably one that describes and identifies Madonna, e.g. `/resources/customer/Madonna`. Say it contains information about Madonna: `{ "name" : "Madonna", "soc-sec" : "12345", "phonenumbers" : "/resources/customer/Madonna/phonenumber" }`. In that case, the fact that you found the link in the Madonna resource, probably means that the phone number resource belongs to her. The URIs are unimportant, they might as well have been `/resources/customer/123` and `/resources/phones/789`
mogsie
If I am told to design a URI space modeling my resources, then clearly I am expected to think of the entity Madonna when I see the URI segment "Madonna". Whether I'm a human client or program client, I need semantic values with which to reason and navigate. How can I know that customer resource name "123" designates Madonna unless I am given a mapping stating so, in which case privacy is also lost. No one has answered my question about how to build a crawler program that has awareness of the resource graph (knows its contents by name).
Shaping
About the discovering: If I don't want to discover Madonna's number, and I know she has one and that its name is "phonenumber", then why can't I just drill right to it, as in the current example? If I choose to discover her phone number once I have her personal representation, then the phone number name ought to be recognizable by a machine (a program), and that program needs to be told how to recognize it by a name like "phonenumber". Ultimately, there must be a mapping between semantic quantities like <phone number> and syntactic ones like "phonenumber" or if you want to be opaque "789".
Shaping
Also, if I retrieve a representation of a resource named /customer/madonna and this is unique name, then I definitely get a Json chunk describing Madonna. If the name is /customers/smith, then do I get a collection of representations of all known Smiths? What if I want the Smith with SS# 124-456-7890? The situation reduces to : how do you model a predicate-string query using REST resource names? – Shaping 10 mins ago
Shaping
When you say "semantic values" yes, you do need that. But the different values of a URI's path segment (or indeed query parameters) don't constitute *semantic* values. Semantics must be had from representations. Humans might have the brain capacity to understand that the URI /customers/madonna is about Madonna, but programs aren't allowed to make such assumptions just like that.
mogsie
REST implies that you *ought not* look at the URI to try to understand what the URI identifies. The URI `/lolcats` might as well represent my bank account. How you know it's about my bankaccount is up to the representation itself, or the links you find that point to it.
mogsie
You wrote: "*I retrieve a representation of a resource named /customer/madonna and this is unique name, then I definitely get a Json chunk describing Madonna*" -- That's a failed assumption. Based on the URI, you can't make assumptions on what it is.
mogsie
I understand, but the constraints you impose above do not allow me to do what I need to do with a client program. "REST implies that you ought not look at the URI to try to understand what the URI identifies." This part is confusing because identifying resources is Task #1 in REST. Now you say the names are arbitrary, opaque even. We still need a specified way of naming resources. The program must be aware of these names or aware of an algorithm by which the names can by synthesized, in order to search the graph. Please explain how a client program searches the resource graph.
Shaping
The server is free to name resources in any fashion it wishes, e.g. using an internal database primary key, a social security number, or a random number, e.g. a UUID. So if you see a URI there are two ways of inferring what that URI is about. (1) determine it by looking at the context in which you found the URI, or (2) determine it by interacting with the resource, e.g. by GET'ing it.
mogsie
If a client must synthesize URIs it must in fact be told how to do so by *another resource*. OpenSearch is a great example of this; it tells clients how to forge URIs by using an URI template. HTML forms is another, simpler way to tell clients how to forge a URI based on (user) input. So the solution to "search the resource graph" is either (1) make a form, OpenSearch or proprietary similar resource (in which the resource type definition instructs the client how to create URIs within a server and/or problem domain, or (2) search facilities with the possibility to search for unique attributes.
mogsie
Thanks for the additional details. Things are become clearer. "If a client must synthesize URIs it must in fact be told how to do so by another resource." That bit is important. What's important for our application is that the metadata resource for creating the target resource be automatically usable by the client program. I will look futher at the OpenSearch. Do you prefer it to HTML forms? What are the pros/cons of each? Strategy (1) above seems more efficient that (2), which seems to involve lo-REST, namely the RPC-style name-value parameters.
Shaping
"In theory, if you have a customer with id 12345, then you could actually dispense with the href, since you could plug the customer id 12345 into #1 or #2." This seems to be the best way. You don't want to exhaustively list resource names. That really doesn't help you find anything, except by brute-force searching. The template is the way to go. However, if you have a need for opaque URIs, then the input parameters themselves are the only bits of domain-specific data in the URI. What happens if I search for a customer by name instead of by UUID? Do I get a collection of Smiths?
Shaping
..Or do I get no result because the query was under-constrained? Or can I choose how to do it? In the first case I have to do one or more additional searches through the returned representations.
Shaping
if you search for Smiths, you can choose how to do it, as long as you explicitly document what happens in the media type where you define where you plug in the search constraints. I would prefer OpenSearch extensions to forms, since OpenSearch allows for extensions. A third option is XForms, but they might be a bit generic. OpenSearch is sort of "type strong" in that you explicitly define what each extension is.
mogsie
Providing inline links is preferable to providing lookup templates. Imagine searching for a book in Amazon, e.g. amazon.com/search?q=rest+book and it returns lots of books with ASIN numbers. But to open a search result you had to make a note of the ASIN number, click "back" and then enter the ASIN number into a special "ASIN LOOKUP" form. It's tedious, and requires a lot of logic to happen on the client. The web way is the hypermedia way, which is embedding links into the responses. It moves complexity away from the client.
mogsie
You also have to remember that the two aren't mutually exclusive. Sometimes you have a customer ID and want to look it up. Other times you have the name "Smith" and you need to wade through the Smiths to find the one you're looking for. Just like you can browse books by following links _or_ by searching.
mogsie
I most often want to input multiple attributes by which to locate a specific record (name and phone number or name and city and phone number). If I'm not willing to have my client program suffer the inefficiency of searching through intermediate sets of returned representations, then the look-up template seems more much more efficient, especially when large amounts of data are involved (millions of Smiths and you must do a second search to locate the one with a certain SS#). Yes, using links works like the Web, but the iterative searches could be inefficient.
Shaping
Sure. Fielding even mentions the efficiency trade-off in his thesis. It's a valid complaint. Iterative searches could be fuelled by links in search results. If you search for "Smith" at a directory lookup, you get page 1 of the search results (millions) but you could also get links to breakdowns by state or by age or first names. See for example http://www.gulesider.no/gs/categoryList.c?q=smith the links on the left hand side are canned searches. This can be done programmatically too.
mogsie
Shaping
Also, the presumption of an optional {startPage?} parameter is confusing, because, in general, I may not be dealing with "pages" (HTML documents). For example, I don't foresee my current service having an HTML-document root node in the resource graph, but instead an XML element representation containing resource names, whose matching representation are the stuff of that root node. I know that this fragment above is just an example, but it has a strong HTML bias.
Shaping
...Maybe {startDocument?} would generalize the idea more correctly. In any case, no matter the designated, returned hypermedia type, we need a place to start the search, and this seems to be one of two key ideas in the example. The other is the returned hypermedia type.
Shaping
Concerning the efficiency: Yes, I see now that the categorical decomposition will be necessary. Our technology already lets us decompose a query string to extract its name-and-value parameters and then very quickly return any relevant records (this is not a conventional database table structure). So, in this case I would be doing more work to compute all of those categorically separated groups of links, place them in XML documents, and serve them to the client, so that he can walk the categories until he reaches a leaf of interest.
Shaping
With a large complex dataset, there are explosively many ways to search and form categories. Does this seem like a workable architecture to you, given the fact that I can already locally process the parameters of the query very efficiently?
Shaping
This is dragging on. Perhaps a new question or two is in order? It might allow others to voice their opinion... (I actually believe these are important concepts to explain, and comment #30 in a question isn't the place to explain them!
mogsie
Do we agree then that hi-REST cannot be efficiently applied (does not scale well) to accessing large amounts of multidimensional data and that the lo-REST strategy using an HTTP template would in fact be much more efficient? The cost of generating the needed resource names definitely seems prohibitive in the first case.
Shaping
An OpenSearch template allows access to large amounts of multidimentional data in a declarative, general fashion. And I wouldn't call an OpenSearch template "lo-REST", so I would have to disagree. It's as RESTful and efficient as can be. But I feel I can't answer the additional OpenSearch questions in the confines of this comment box :-)
mogsie
Where shall we take the discussion?
Shaping
How about if you ask a question about how to expose multidimentional data in a RESTful fashion, or perhaps even specifically how OpenSearch could be used to do it. ;-)
mogsie
We were already discussing this in the context of hi-REST. The combinatorics don't scale. OpenSearch or similar parameters-in-HTTP-request must be used. You were saying that the hi-REST (the pure URLs, one per resource, with drilling required) is the preferred way. I was only pointing out that it doesn't work for data on any significant scale. So how shall we do this with lo-REST or with OpenSearch? These little boxes are very cramp.
Shaping