views:

425

answers:

2

I'm creating a link shortening service and I'm using base64 encoding/decoding of an incremented ID field to create my urls. A url with the ID "6" would be: http://mysite.com/Ng==

I need to also allow users to create a custom url name, like http://mysite.com/music

Here's my (possibly faulty) approach so far. Help in fixing it would be appreciated.

When someone creates a new link:

  • I get the largest link ID from the database (it's not auto incremented)
  • Increment the ID by 1
  • Generate a short URL code (http://website.com/[short url name]) by base64_encoding that ID
  • Insert into links table: id, short_url_code, destination_url

When someone creates a new link and passes a custom short URL:

  • My plan was base64_decode their custom string and use that as the link ID, but I didn't realize that you can't just base64_decode any alphanumeric string and turn it into a number.

Is there a better encoding method that will let me turn any number into a short string, and any string into a number, so I can always lookup short urls (whether custom or autogenerated) by turning the name into a number and querying for a link with an ID equal to that number?

+5  A: 

First and foremost, make sure you have unicity constraints in place on the ID and short_url_code columns.

When someone creates a new link:

  1. Get the next largest link ID from the database (for performance reasons you should really REALLY use autoincrement or SEQUENCE, depending on what your RDBMS offers; otherwise go ahead and select MAX(ID)+1 )
  2. Generate a short URL code (http://website.com/[short url name]) from ID using base64_encode or any other custom or standard encoding scheme
  3. Insert into the links table: ID, short_url_code, destination_url
  4. If the insert fails because of a constraint violation go back to step 1 to try a new ID; you may have had a violation because:

    1. the same ID has already been used (i.e. inserted) in parallel by another thread/process etc. (this will not happen if you used autoincrement or SEQUENCE, and may happen quite often otherwise), and/or
    2. the same short_url_code has already been used as a custom URL (this will happen very seldomly unless someone is trying to cause trouble on your site)
  5. If the insert succeeded, commit and return the short URL to the user

When someone creates a new link and passes a custom short URL:

  1. Perform the same step 1 as above
  2. Instead of generating the short URL part from ID as in step 2 above, use the custom short_url_code provided by the user
  3. Perform the same step 3 as above
  4. If the insert failed because of:
    1. a constraint violation on ID: go back to step 1 to try a new ID
    2. a constraint violation on short_url_code: return an error to the user asking him to pick a different custom URL, as the short URL he/she provided has already been used
  5. Perform the same step 5 as above
vladr
Thanks Vlad. I should have mentioned that I was already handling constraint violations. I've switched to base32 which lets me convert a custom url into a number and insert that as the ID. This makes it easy because I only have to have ID as the primary key. If there is a constraint violation for the base32 representation of a custom name it tells them that name is already taken. If there is a constraint violation for a non-custom url, it just keeps incrementing the ID until it can insert. Does that sound like a decent solution?
makeee
depends on what tradeoffs you are willing to make; most database native `int` types are at most 64 bits long (the `bigint` or equivalent type), which means that if I provide you with a custom short URL that's longer than 64/5=12 (5=log2(32)) characters you will not be able to accomodate me. Would it be acceptable to not allow users to provide custom URLs longer than 12 characters?
vladr
Good point. I do need more than 12 characters. How about this: When creating the link, if the custom name is more than 12 characters then I just use a next biggest ID (excluding IDs of custom name links). If it is less than 12 characters I encode the custom name in the ID. Then when looking up a link by its short name, if that ID is not found in the DB that means it was more than 12 chars, so then I just look it up by it's short name.
makeee
While a bit complex, this would maintain the incrementing system (good for keeping urls short) and still allow me to take advantage of quick selects for custom names that are under 12 chars.
makeee
Never mind, just decided to lookup links by their link name (instead of ID) and ditch the whole base encoding thing.
makeee
Personally, instead of base64 encoding, I'd prefer PHP's base_convert (http://nl2.php.net/manual/en/function.base-convert.php). You can convert from base10 to base36 and back again without problems. For a higher base (i.e. case-sensitive A-Za-z0-9), you'd need a custom function, though I think base36 will do just fine.
Duroth
+1  A: 

base64 can be used to make short urls, but it can also make the url longer. For instance the base64_encode of the number 1 is 'MQ==' which is 4 times the size. Base64 will always have 2 characters to obtain the 64bits, which is not ideal for short urls.

If size is the most important factor then you maybe able to produce the shortest urls by relying on internationalization.

This can make a URI rather long (up to 9 ASCII characters for a single Unicode character), but the intention is that browsers only need to display the decoded form, and many protocols can send UTF-8 without the %HH escaping.

Keep in mind that Browsers work quite well with UTF-8, and twitter will have no trouble with these urls.

Rook