views:

157

answers:

4

I have a resource at a URL that both humans and machines should be able to read:

http://example.com/foo-collection/foo001

What is the best way to distinguish between human browsers and machines, and return either HTML or a domain-specific XML response?

(1) The Accept type field in the request?

(2) An additional bit of URL? eg:

http://example.com/foo-collection/foo001 -> returns HTML
http://example.com/foo-collection/foo001?xml -> returns, er, XML

I do not wish to oblige machines reading the resource to parse HTML (or XHTML for that matter). Machines like the googlebot should receive the HTML response.

It is reasonable to assume I control the machine readers.

A: 

I would say adding a Query String parameter is your best bet. The only way to automatically detect whether your client is a browser(human) or application would be to read the User-Agent string from the HTTP Request. But this is easily set by any application to mimic a browser, you're not guaranteed that this is going to work.

Steve
+7  A: 

If this is under your control, rather than adding a query parameter why not add a file extension:

http://example.com/foo-collection/foo001.html - return HTML
http://example.com/foo-collection/foo001.xml - return XML

Apart from anything else, that means if someone fetches it with wget or saves it from their browser, it'll have an appropriate filename without any fuss.

Jon Skeet
+3  A: 

My preference is to make it a first-class part of the URI. This is debatable, since there are -- in a sense -- multiple URI's for the same resource. And is "format" really part of the URI?

http://example.com/foo-collection/html/foo001
http://example.com/foo-collection/xml/foo001

These are very easy deal with in a web framework that has URI parsing to direct the request to the proper application.

S.Lott
+3  A: 

If this is indeed the same resource with two different representations, the HTTP invites you to use the Accept-header as you suggest. This is probably a very reliable way to distinguish between the two different scenarios. You can be plenty sure that user agents (including search engine spiders) send the Accept-header properly.

About the machine agents you are going to give XML; are they under your control? In that case you can be doubly sure that Accept will work. If they do not set this header properly, you can give XML as default. User agents DO set the header properly.

I would try to use the Accept heder for this, because this is exactly what the Accept header is there for.


The problem with having two different URLs is that is is not automatically apparent that these two represent the same underlying resource. This can be bad if a user finds an URL in one program, which renders HTML, and pastes it in the other, which needs XML. At this point a smart user could probably change the URL appropriately, but this is just a source of error that you don't need.

Magnus Hoff
Given that any program needing the XML is a program I control, can't they always modify any URL pasted in? (I'm being a bit of a devils advocate - I do like the sound of this scheme).
John McAleely
Sure, you could always tack on additional conventions, but that would always be "John's special deal". By using the Accept-header you clearly communicate that it is one resource, and you differentiate by the only relevant criterion -- which format do you need at any given time?
Magnus Hoff