views:

114

answers:

5

Inspired by a thought while looking at the question "Correct HTTP status code when resource is available but not accessible because of permissions", I will use the same scenario to illustrate my hypothetical question.

Imagine I am building a a carpooling web service.

Suppose the following

GET /api/persons/angela/location

retrieves the current position of user "angela". Only angela herself and a possible driver that is going to pick her should be able to know her location, so if the request is not authenticated to an appropriate user, a 401 Unauthorized response is returned.

Also consider the request

GET /api/persons/john/location

when no user called john has registered with the system. There is no john resource let alone a resource for john's location, so this obviously returns a 404 Not Found. Or does it?

What if I don't want to reveal whether or not john is registered with the system?

(Perhaps the usernames are drawn from a small pool of university logins, and there is a very militant cycling group on campus that takes a very dim view of car usage, even if you are pooling? They could make requests to the URL for every user, and if they receive a 401 instead of 404, infer that the user is a car pooler)

Does it make sense to return a 401 Unauthorized for this request, even though the resource does not exist and there is no possible set of credentials that could be supplied in a request to have the server return a 200?

+10  A: 

Actually, the W3C recommends (RFC 2616 §10.4.4 403 Forbidden) doing the opposite. If someone attempts to access a resource, but is not properly authenticated, return 404 then, rather than 403 (Forbidden). This still solves the information disclosure issue.

If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.

Thus, you would never return 403 (or 401). However, I think your solution is also reasonable.

EDIT: I think Gabe's on the right track. You would have to reconsider part of the design, but why not:

  • Not found - 404
  • User-specific insufficient permission - 404
  • General insufficient permission (no one can access) - 403
  • Not logged in - 401
Matthew Flaschen
For 403 that's true, but my hypothetical API is returning 401 in order to challenge the client for credentials. 403 is used for resources that cannot be viewed by anyone, regardless of credentials supplied.
Day
Day: You return 401 if the client is not authenticated; 404 if the user is authenticated but doesn't have access to the resource.
Gabe
@Gabe: I see, that should work. I think the answer needs editing to make things clearer though.
Day
@Day, feel free to ask questions about any unclear parts.
Matthew Flaschen
+1  A: 

I think it's fine if you want to return a 401 Unauthorized if the request is made by a client that is not a user. However, if a user makes the request and is authenticated, then I don't think that a 401 is the best solution. If you feel that returning a 404 would compromise the security of some users, then you may want to consider returning a 403 Forbidden or perhaps a 200 OK, but just don't specify a location. If I query for user bob and get a response and query for user sam and get an error response, be it 401, 403, 404, etc, then I can probably come to the conclusion that it means that user sam doesn't exist.

200 OK with no location specified may be the most disguised solution.

Edit: Just to illustrate what I am proposing. Return a 401 if the client isn't authorized. Otherwise, always return a 200 OK.

<user-location for="bob">
    <location>geo-coordinates here</location>
</user-location>

<user-location for="sam">
    <location/>
</user-location>

This doesn't really indicate if sam exists or not, or perhaps there just isn't any location data for him currently.

Steven Benitez
But he is concerned with even revealing the existence of a certain user via this method; returning anything different at all would give that information away.
Andrew Barber
Not necessarily. It could just indicate that there is no current locational data for the user. I have clarified this in my answer.
Steven Benitez
@Steven. Interesting solution, but something doesn't feel quite right about this. I think we get problems if we act as though requests are authenticated when they really aren't. Returning a faked empty location if the requester is not authenticated instead of a 401 means that e.g. eric who is registered and has permission to see sam's location (because sam has accepted eric as a potential driver) may mistype his credentials and not realise because he gets a 200 response, and so gives up thinking sam currently has no location.
Day
@Day: I am not advocating that he returns that response if the user is unauthenticated. I am advocating that he return that response if the user is authenticated but the user does not exist (or there is no current location data for the user, etc.). If the user is not authenticated, then he should return a 401 Unauthorized. I edited again to better clarify that.
Steven Benitez
@Steven. Ah I see. In that case, this would work with location where an existing user and a non-existing user can both have an "empty" location, so we can't tell them apart. But in the more general case, where it is not possible to have the same representation for an existing and a non-existing user, we couldn't do this. e.g. I'm authorized, what does `GET /api/persons/john/photo` give me if all registered users have photos? Always return a 200 OK with a photo, but if the user doesn't exist... what photo?
Day
+1 for some lateral thinking here too, even if it can't quite work for all cases.
Day
+1  A: 

I think the best solution would be to return 403 (forbidden) for every (potential) page in a class, if the user is not authenticated to see any of them. If the user is, return 404 for stuff that's not there and 200 for stuff that is.

SamB
But most likely it won't be particularly difficult or costly to register as a carpooler. So you're still effectively giving the information to everyone.
Matthew Flaschen
@Matthew Flaschen exactly right, that's the problem with this solution. Thanks for the suggestion though
Day
True. I guess the way to avoid that would be to not take the usernames from the email addresses? And apparently I didn't get the distinction between 403 and 401. Oops...
SamB
A: 

Return 401 Unauthorized in any case in which the user is not allowed to see a particular page, whether it exists or not.

From RFC 2616: "If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials."

Consider HTTP servers that use separate lists of credentials for authentication to different URLs. Obviously, a server should not check every list when a URL is requested, so if the credentials are not in that one applicable list, because HTTP requests are completely independent of each other, it makes sense to return 401 Unauthorized if the credentials are not valid for that particular URL.

Furthermore, the description of 403 Forbidden includes: "Authorization will not help and the request SHOULD NOT be repeated." In contrast, if the user chooses to log in using the correct credentials, Authorization will help.

idealmachine
I'm not sure I follow the part about .htpasswd files here. Probably best to try and explain in generic HTTP terms without discussing server implementation details?
Day
.htpasswd files are lists of valid credentials for a particular URL or URLs. They are used by the Apache Web Server and referenced by specific configuration directives that instruct the server to require authentication for particular URLs.
idealmachine
+1  A: 

If usernames are sensitive information, then don't put them directly in the URI. If you use hypermedia within your representations then you can make it just as easy for an authorized client applications to navigate your api without leaking information in your URLs.

Hackable urls are great for information that you want everyone to be able to access easily. However, for a RESTful client, there is no problem using URIs that are completely opaque.

Once you have removed the direct correlation between the user and the URI, it becomes difficult to infer any information from a 401 response code.

Darrel Miller
+1 for some lateral thinking. URLs don't even have to be opaque, just hide the username. eg `/api/persons/<id>/location`. Still readable, `<id>` is allocated to username when they register and this mapping is kept private by the system, but usernames are still used in resources and can be seen by authenticated users that have been given the appropriate permissions. It's ok to return 401 for `/api/persons/1/location` when requester is not authenticated, and 404 for `/api/persons/3/location` because that doesn't reveal the username of the registered person with id=1.
Day
I guess the downside could be that this adds a layer of indirection, which adds complexity and could have significant storage implications if you have a large set of resources that need to be assigned opaque handles just for use in URLs
Day
@Day There is an extra layer of indirection that is for sure, however, that's the way REST services are supposed to work. You only ever start with the root URL, all the other URLs that a client uses should be discovered from the server. As for the storage, I don't really buy that. Storing an extra int, or even big int if you have grand visions, per user account is trivial.
Darrel Miller
Accepting this answer as it strikes me as the cleanest and most RESTful solution, allowing me to use the correct status codes without revealing any sensitive info. It's a shame this answer only has a single vote. I think the top voted answer needs some editing to remove talk of 403 and make that solution clearer.
Day