views:

24

answers:

1

Let's say I've designed a media type which is a strict subset of another media type. For example, my media type is application/vnd.example.foo+xml (hereafter abbreviated foo+xml). This media type is a strict subset of the application/xhtml+xml (hereafter abbreviated xhtml) media type. Basically my media type definition adds additional processing instructions (or replaces them completely) to certain constructs within the xhtml media type. For the sake of example, it could say that in any foo+xml documents, the xpath //ul[@class='foo']/li[a] shall be shown in a specific way by clients, and the rest of the document shall be ignored, a processing model which is quite different from that of the original xhtml media type.

Armed with this information, the server can now start creating representations of that type, and my clients can pass Accept headers and happily consume this type of document, both of them honoring the processing instructions laid out in my type definition. However, it's a custom media type which I can't assume anyone will know how to process.

An option that I have is this:

  • When a client prefers the foo+xml media type, I serve the document with a Content-Type set to that media type.
  • When a client prefers the xhtml media type, I serve the same document with an xhtml Content-Type header

This means that generic clients that don't know what foo+xml but likely understand what xhtml is can still process my document, follow links to other resources, present it to users in a generic fashion and so on. Likewise, a client that knows the semantics of foo+xml can actually get a confirmation that this document is actually just that, instead of having to guess or introspect the document to see if it at all looks like something it can process (e.g. via HTML profiling, microformats, etc.).

  1. What are the pros and cons of doing this
  2. Is there prior art that echoes this technique?
+1  A: 

Although I have never seen any authoritative discussion on this particular issue I would say it seems perfectly valid. It is similar to the idea of requesting text/plain for an HTML document to effectively do a view source operation.

From the client's perspective, it has no idea that the bytes are identical to another representation, so I cannot see how there could be pros or cons for the client.

I guess the tricky issue comes with caching. Are you going use the same URI to return both representations, or are you going to redirect from the generic URI to ones specific to the representation? Will there be one or two copies of the bytes in the cache? Do you need to set the Vary header to vary on Content-type? Can you live with two copies in the cache? Do you need to do updates on one or both of these representations? If so, will the cache invalidate both copies if they exist?

Darrel Miller
Vary would need to be set to the content type, and they should probably have different ETags, otherwise caches might muddle up cached responses (since some headers are part of what's cached by intermediaries). I can live with two copies in the cache, and yes they'll probably both accept updates, and yes I guess both should be invalidated. Good points.
mogsie