views:

372

answers:

5

Hello!

Problem:

I have dynamic pages in PHP where the content is shown according to the given id. The id is always submitted via a GET parameter: page.php?id=X This causes a problem: Site visitors can enumerate the ids and simply walk through all the different content pages. This shouldn't be possible, of course.

How could this be solved?

My approach is to encode all ids in links and forms which are used as a GET parameter later. At the beginning of every page, the given id is decoded into the "real" id which is used in the database. Is this a good approach? Would you choose another way?

Possible solution of my approach:

I would convert the integer id to a base 38 integer and replace the digits by characters of a given list. I would use these characters for the encoded string id:

a-z 0-9 - _

Would you use other characters as well? For these characters my script would be this:

function id2secure($old_number) {
 $alphabet_en = array(0=>'1', 1=>'3', 2=>'5', 3=>'7', 4=>'9', 5=>'0', 6=>'2', 7=>'4', 8=>'6', 9=>'8', 10=>'a', 11=>'c', 12=>'e', 13=>'g', 14=>'i', 15=>'k', 16=>'m', 17=>'o', 18=>'q', 19=>'s', 20=>'u', 21=>'w', 22=>'y', 23=>'b', 24=>'d', 25=>'f', 26=>'h', 27=>'j', 28=>'l', 29=>'n', 30=>'p', 31=>'r', 32=>'t', 33=>'v', 34=>'x', 35=>'z', 36=>'-', 37=>'_');
 $new_number = '';
 while ($old_number > 0) {
  $rest = $old_number%38;
  if (!isset($alphabet_en[$rest])) { return FALSE; }
  $new_number .= $alphabet_en[$rest];
  $old_number = floor($old_number/38);
 }
 $new_number = strrev($new_number);
 return $new_number;
}

Additional question:

What would be the reverse function for my function?

I hope you can help me. Thank you!

+8  A: 

Can the users get to the pages via the Website? If the answer is yes then you should ask yourself if this is really a problem or not.

If not then the problem is that you're not securing your pages or to put it another way: you're relying on obscurity for security, which is never a good move.

My advice? Either secure your pages so only the right users can access them or don't worry about it.

If you really must worry about it, just pass an extra field that must be correct for the given page. I wouldn't construct this from the ID. Perhaps generate another number or a GUID when you create the page entry in the database. If both fields aren't correct then don't display the page.

Forget the simple character substitution and other naive obfuscation techniques. They're a waste of your time.

Edit: if you're after non-sequential IDs that are the same length, consider using UUIDs instead of auto-increment primary keys. Basically this is done at application level:

  • Change your primary key to char(36);
  • In your insert statement you have to set the key and populate it with the MySQL UUID() function.

Take a look at To UUID or not to UUID ? and UUID as a primary key. There is performance degradation from this (specifically because you're using characters rather than integers for lookups) but unless you have a large (1 million+ rows) or data it probably won't be an issue in practice.

cletus
Now, every page is safed on its own. But I thought it would look nicer if every id has the same length. :D And the users don't know how many users my site has and which one was the first etc.
You can achieve both of those things by using a GUID as an ID instead of an auto increment field.
cletus
Do you think of something like "936DA01F-9ABD-4d9d-80C7-02AF85C822A8"? How should I create it?
A: 

It will still be possible to walk through your pages sequentially, although it would be harder to guess the pattern. As long as the root pattern is sequential you'll have a problem eventually (assuming it's actually a problem in the first place, and not just something you don't like the idea of).

You could use random numbers for the IDs. That would prevent easy guessing of page IDs and page order (again, if that matters).

acrosman
But with random numbers I would have lots of collisions, wouldn't I?
It depends on how many pages you plan to have, and now many big a data type you use for storage. Again, it depends some on your system and your goal.
acrosman
A: 

Site visitors can enumerage the ids and simply walk through all the different content pages. This shouldn't be possible, of course.

I'm not sure why this should be a problem - people can view a list of all the (public, Googlebot-indexed) pages on a website just by typing site:domain.com into Google, and loop through them should they wish. Changing the unique index you use won't change that.

But if you really don't want visitors to access your pages directly, a simple quick-fix is to use POST instead of GET.

Waggers
+1 for POST instead of GET.
cpharmston
But if I use post there is an action needed before I can display the right content, correct?
Yes, it essentially means changing every link into a form and using <a href="javascript: submitform()"> for each link, with appropriate <noscript> tags for anyone not running javascript. It may seem longwinded but it's relatively simple and effective.
Waggers
But then it doesn't work for users without javascript (even if that's only 1%). And: That's not what forms are for, right?
It does work for users without javascript if you include a standard submit button in the <noscript> tags, although this could destroy the look and feel somewhat. Agreed, this is not really what forms are for, but I don't think GET was originally intended for page navigation either.
Waggers
oh wow - change all your links into forms which use javascript to submit a POST variable? that's... that's awful. goodbye usability, accessibility, SEO, ability-to-bookmark, ability-to-refresh-the-page-without-annoying-messages, etc, etc...
nickf
I didn't say it was elegant...
Waggers
A: 

I wouldn't bother about this "problem", but anyway I used on one of my projects such method:

After saving new page to the DB, I generated md5 of (record_id + page_title) and put it to the special field pagecode. Then I accessed the pages by that page code instead of id. And it's better to index the pagecode field in the database.

Sergei
In another question here, they told me that md5 hashes aren't adequate primary keys. I should rather use integers since it's faster and less risk of collisions.
Hashes are not primary keys, just indexes.
Sergei
But an ID should be the primary key since it should also be unique, shouldn't it?
ID is a primary key with autoincrement. Hash is just unique key. I don't see a problem here, that site is working two years already.Registered users access those pages by id, and anonymous by page code.
Sergei
+1  A: 

Use a checksum algorithm like Luhn:

$id = 1337;

$_GET['id'] = Luhn($id, 3); // 1337518, adds 3 checkdigits
$_GET['id'] = Luhn_Verify($_GET['id'], 3); // 1337, returns the original number of false if validation fails

echo $_GET['id']; // 1337

EDIT: I forgot to mention, but by using this method you can check if an ID is valid without even have to query the database, example:

$id = Luhn_Verify($_GET['id'], 3);

if ($id === false)
{
    // someone is trying to guess the ID
}

else
{
    // $id is valid, do the DB stuff here
}
Alix Axel
What about collisions, are they possible?
If you use always the same number of iterations (in my example 3), no.
Alix Axel
Nice algorithm and good idea. Thank you!