views:

637

answers:

5

What are the pros and cons of using your databases primary key as a URL identifier? As an example, http://localhost/post/view/13 - 13 being my primary key for my posts table.

Some sites like reddit use what I assume is a unique id that is not the primary key but still unique to help identify the link:

http://www.reddit.com/r/funny/comments/7ynin/the_mystery_of_irelands_worst_driver/

You can change the last part of the URL to whatever you want as long as the /7ynin/ is the same.


Digg seems to use a slug of the links title to id a link:

http://digg.com/space/Liquid_Water_Recently_Seen_on_Mars


While if i recall correctly a default WordPress install uses index.php?p=# as their id until fancy urls are enabled.


I can see why for SEO's sake you would want to have the most informative url possible but I am just trying to see if using the primary key is a security risk or simply just bad form.

+1  A: 

A con: any visitor can easily try and guess other IDs, which may not be what you want.

pantulis
For the most part I don't really see what is wrong in accessing things on a site using different ID's in the URL. If you are talking about things like personal user pages and things which should be limited, you should already have other methods of authentication in place to prevent access.
TheTXI
+1  A: 

It isn't inherently a security risk, though it does tell external entities things about your system, which is generally good practice to avoid.

chaos
I sacrifice? I figure if I used relevant keywords I would be promoting SEO. Maybe I just read that wrong.
In that edit (which I discarded because I hadn't read the last paragraph of your question when I wrote it), I meant you sacrifice SEO value if you use primary keys instead of slugs. (You also sacrifice SEO value if you use them in addition to slugs, but not as much.)
chaos
+2  A: 

As you said, the point of putting titles directly in the URL is SEO. Having keywords in the URL has a significant effect on search engine results.

However, a few other thoughts related to your examples:

  • I'm not sure why you assume that the reddit alphanumeric key is not the primary, there's nothing that forces primary keys to be numeric. If it's a unique identifier for the post, there's no reason not to use it as the primary key (or at least part of it).
  • Digg actually enforces uniqueness of titles (perhaps just inside a particular category, I haven't been to Digg in years, so I can't recall). I used to see this fairly often with a duplicate story having a URL like:

    http://digg.com/space/Liquid_Water_Recently_Seen_on_Mars_2
    

    This implies that the title is at least part of the primary key, since that's the only way to identify which story the link was intending to be aimed at.

There isn't really any significant security risk with using the primary key in the URL, other than the ability for people to guess/predict other ones, as pantulis mentioned. But you shouldn't be relying on "nobody will guess this" as a security measure anyway.

Chad Birch
Very good points. I suppose my idea of what constitutes a primary key is a little wrong. I like the cleanliness of digg's urls but the ability to allow duplicate slugs in urls of reddit. I think I'll try to use a combination of the two. Thanks!
+1  A: 

If you don't include the primary key(s) in the URL/link, then you have to make some kind of temporary synthetic key, AND, then, you have to save the mapping of that key in the session for the user. This adds more state / memory usage / something to break to your application.

If the value is truly sensitive, this might be worth the cost of hiding it. However, obscuring the key doesn't really make it secure, does it? You need to check user roles in whatever "controller" (servlet, code-behind, whatever) before granting access to the item.

Roboprog
+5  A: 

You always want to present the user with a nice URL-not some nasty auto-generated ID. But I dont think you should make said "friendly url" the primary key. You should still use a "classic" auto-incremented, numeric PK and have a second column that is a unique "friendly url". Why?

  1. All the comment tables, ratings tables, whatever tables that have a relationship with your content table can use the numeric primary key. This means smaller indexes and lower memory use.
  2. Somebody will want to change the friendly url. If you've got a numeric primary key, you don't have to update any of your dependent tables (or have the DB do it via a cascading update).
  3. In the future, you can abstract out the URL bits into another table. Said table can then store "legacy" URL mappings that issue redirects to the primary "real" URL map. Then when the user wants to change the friendly URL, you don't have to break the all the inbound legacy URL's. Couldn't do this if your primary key was the "friendly URL".
  4. I'd still be inclined to use the numeric primary key in all my AJAX goo (for example, a post_new_comment() javascript function would take the primary key, not some friendly URL). The only time I'd use the friendly URL is in any user-facing URL structure.
  5. As for security? If your content is access controlled, you are gonna have to check access no matter if it is the primary key or some friendly URL.
  6. If you allow ways to get to the content via the primary key, people might try plugging in random ID's. If your requirement for not only limited access to content, but the denial said content exists, it is a matter of the phrasing of your errors. It is the same as with login failures--you dont say "username not found" you say "bad username or password". Plugging in random values to find content is gonna be a problem for any approach you take, it is just that with numeric keys there is way fewer values to try.

Bottom line: Friendly URL's? Hell yeah. Using them as the primary key? Hell no.

Cory R. King