views:

212

answers:

8

Most sites which use an auto-increment primary-key display it openly in the url.

i.e.

example.org/?id=5

This makes it very easy for anyone to spider a site and collect all the information by simply incrementing the value of id. I can understand where in some cases this is a bad thing if permissions/authentication are not setup correctly and anyone could view anything by simply guessing the id, but is it ever a good thing?

example.org/?id=e4da3b7fbbce2345d7772b0674a318d5

Is there ever a situation where hashing the id to prevent crawling is bad-practice (besides losing the time it takes to setup this functionality)? Or is this all a moot topic because by putting something on the web you accept the risk of it being stolen/mined?

+2  A: 

I think hashing for publicly accessible id's is not a bad thing, but showing sequential id's will in some cases be a bad thing. Even better, use GUID/UUIDs for all your IDs. You can even use sequential GUIDS in a lot of technologies, so it's faster (insert-stage) (though not as good in a distributed environment)

Luke Schafer
+3  A: 

Using a hash like MD5 or SHA on the ID is not a good idea:

  • there is always the possibility of collisions. That is, two different IDs hash to the same value.
  • How are you going to unhash it back to the actual ID?

A better approach if you're set on avoiding incrementing IDs would be to use a GUID, or just a random value when you create the ID.

That said, if your application security relies on people not guessing an ID, that shows some flaws elsewhere in the system. My advice: stick to the plain and easy auto-incrementing ID and apply some proper access control.

nickf
Yeah, using MD5s as integer IDs is just the same as using an integer ID, since here's a nice one-to-one correspondence between the two. A sufficiently motivated user can come up with a rainbow table (http://www.freerainbowtables.com/en/tables/md5/) and hack your url using the MD5s instead of integers.
Seth
+1  A: 

My opinion is that if something is on the web, and is served without requiring authorization, it was put with the intention that it should be publicly accessible. Actively trying to make it more difficult to access seems counter-intuitive.

recursive
A: 

My general rule is to use a GUID if I'm showing something that has to be displayed in a URL and also requires credentials to access or is unique to a particular user (like an order id). http://site.com/orders?id=e4da3b7fbbce2345d7772b0674a318d5

That way another user won't be able to "peek" at the next order by hacking the url. They may be denied access to someone else's order, but throwing a zillion letters and numbers at them is a pretty clear way to say "don't mess with this".

If I'm showing something that's public and not tied to a particular user, then I may use the integer key. For example, for displaying pictures, you might wish to allow your users to hack the url to see the next picture.

http://example.org/pictures?id=4, http://example.org/pictures?id=5, etc.

(I actually wouldn't do either as a simple GET parameter, I'd use mod_rewrite (or something) to make readable urls. Something like http://example.org/pictures/4 -> /pictures.php?picture_id=4, etc.)

Seth
IMO, If another user can "peek" at another order with the right URL, even an obscured URL, then the software has a **major** security hole.
Chip Uni
A: 

Hashing an integer is a poor implementation of security by obscurity, so if that's the goal, a true GUID or even a "sequential" GUID (whether via NEWSEQUENTIALID() or COMB algorithm) is much better.

Either way, no one types URLs anymore, so I don't see much sense in worrying about the difference in length.

richardtallent
+1  A: 

Often, spidering a site is a Good Thing. If you want your information available as much as possible, you want sites like Google to gather data on your site, so that others can find it.

If you don't want people to read through your site, use authentication, and deny access to people who don't have access.

Random-looking URLs only give the impression of security, without giving the reality. If you put account information (hidden) in a URL, everyone will have access to that web spider's account.

Chip Uni
+3  A: 

Generally with web-sites you're trying to make them easy to crawl and get access to all the information so that you can get good search rankings and drive traffic to your site. Good web developers design their HTML with search engines in mind, and often also provide things like RSS feeds and site maps to make it easier to crawl content. So if you're trying to make crawling more difficult by not using sequential identifiers then (a) you aren't making it more difficult, because crawlers work by following links, not by guessing URLs, and (b) you're trying to make something more difficult that you also spend time trying to make easier, which makes no sense.

If you need security then use actual security. Use checks of the principal to authorize or deny access to resources. Obfuscating URLs is no security at all.

So I don't see any problem with using numeric identifiers, or any value in trying to obfuscate them.

Greg Beech
+1  A: 

Hashing or randomizing identifiers or other URL components can be a good practice when you don't want your URLs to be traversable. This is not security, but it will discourage the use (or abuse) of your server resources by crawlers, and can help you to identify when it does happen.

In general, you don't want to expose application state, such as which IDs will be allocated in the future, since it may allow an attacker to use a prediction in ways that you didn't forsee. For example, BIND's sequential transaction IDs were a security flaw.

If you do want to encourage crawling or other traversal, a more rigorous way would be to provide links, rather than by providing an implementation detail which may change in the future.

Using sequential integers as IDs can make many things cheaper on your end, and might be a resonable tradeoff to make.

Karl Anderson