views:

299

answers:

3

The site I am working on wants to generate its own shortened URLs rather than rely on a third party like tinyurl or bit.ly.

Obviously I could keep a running count new URLs as they are added to the site and use that to generate the short URLs. But I am trying to avoid that if possible since it seems like a lot of work just to make this one thing work.

As the things that need short URLs are all real physical files on the webserver my current solution is to use their inode numbers as those are already generated for me ready to use and guaranteed to be unique.

function short_name($file) {
   $ino = @fileinode($file);
   $s = base_convert($ino, 10, 36);
   return $s;
}

This seems to work. Question is, what can I do to make the short URL even shorter?

On the system where this is being used, the inodes for newly added files are in a range that makes the function above return a string 7 characters long.

Can I safely throw away some (half?) of the bits of the inode? And if so, should it be the high bits or the low bits?

I thought of using the crc32 of the filename, but that actually makes my short names longer than using the inode.

Would something like this have any risk of collisions? I've been able to get down to single digits by picking the right value of "$referencefile".

function short_name($file) {
   $ino = @fileinode($file);
   // arbitrarily selected pre-existing file,
   // as all newer files will have higher inodes
   $ino = $ino - @fileinode($referencefile);
   $s = base_convert($ino, 10, 36);
   return $s;
}
+11  A: 

Not sure this is a good idea : if you have to change server, or change disk / reformat it, the inodes numbers of your files will most probably change... And all your short URL will be broken / lost !

Same thing if, for any reason, you need to move your files to another partition of your disk, btw.


Another idea might be to calculate some crc/md5/whatever of the file's name, like you suggested, and use some algorithm to "shorten" it.

Here are a couple articles about that :

Pascal MARTIN
Good point. One key aspect of URIs is that they should never change - http://www.w3.org/Provider/Style/URI - and this'd violate it.
ceejayoz
Another risk would be of unintentionally allowing access to data that you don't expect to allow. For example, let's say that the user requests inode 17, and that happens to be /etc/shadow (or they request 1111, which happens to be a link to /etc/shadow) . You'll have to do additional checking to make sure that the file is in the directory where you expect it, and it may not be completely trivial...
atk
A: 

Check out Lessn by Sean Inman; Haven't played with it yet, but it's a self-hosted roll your own URL solution.

Alex Mcp
+1  A: 

Rather clever use of the filesystem there. If you are guaranteed that inode ids are unique its a quick way of generating the unique numbers. I wonder if this could work consistently over NFS, because obviously different machines will have different inode numbers. You'd then just serialize the link info in the file you create there.

To shorten the urls a bit, you might take case sensitivity into account, and do one of the safe encodings (you'll get about base62 out of it - 10 [0-9] + 26 (a-z) + 26 (A-Z), or less if you remove some of the 'conflict' letters like I vs l vs 1... there are plenty of examples/libraries out there).

You'll also want to 'home' your ids with an offset, like you said. You will also need to figure out how to keep temp file/log file, etc creation from eating up your keyspace.

Justin