tags:

views:

304

answers:

4

Hey,

I am working on a project that involves a url "forwarder" (like bit.ly or tinyurl.com, but we don't really need it to be short).

For that, I need to "generate" alphanumeric strings (I explicitly want alphanumeric) to map to each url. One of the options would be generate a random string and store it somewhere. However, I'd like to avoid using a database since we don't use any in our app. I want to actually "encode" the url so that it can be decoded later.

Any tips on how to do that?

+2  A: 

Can't be done. An arbitrary URL contains many characters -- let's say 100. A shortened URL contains maybe 5. You can't use 5 characters to reconstruct 100 without a lookup table of some kind; there's simply not enough information available to do it.

EDIT 1: Well, if you don't actually need a URL shortener (then why did you write that?), there are plenty of options. I'd go for plain Base64 encoding, perhaps after a pass through zlib or another compressor (that might make URLs longer; you'll have to measure if it helps or not).

EDIT 2: Standard Base64 does use three non-alphanumeric characters: +, /, and -. If these are unacceptable, you have a couple of options:

  1. Modified Base64. Wikipedia suggests "modified Base64 for URL", which drops all = and replaces + and / with - and _ respectively. But those still aren't alphanumeric, which doesn't help you.

  2. Some ad-hoc scheme, like Base32 or Base36. This is really easy to implement if you know how Base64 is done (see above link). (Edit 3: I guess Base32 is actually standardized. Looks like RFC 4648 Base32 with 8 padding instead of = padding would work just fine for you).

  3. Some semi-standard approach. There are plenty of possibilities. Unfortunately, most of them rely on a couple of special non-alphanumeric characters, simply because by using as few as one or two more characters you can get far superior performance. Take a look at Binary-to-text encoding for a better survey than I can give.

kquinn
Well... I don't actually care if it's short or not... the important thing is that it needs to "hold" the thr orginal url in itself. We're mainly using it to track a number of clicks.
Julien Genestoux
Sorry for not being clear enoigh from the begining... I have tried to use Base64 ... the problem is that it adds non-alphanum characters, like %. is there any way to get rid of them?Thanks for the great help!
Julien Genestoux
Awesome! Thx for all of these pointers... I'll let you know which path I take!
Julien Genestoux
You could also use some custom charset with let’s say 62 characters (0-9, a-z, A-Z) or 56 characters (same as base 62 but without the similar looking 0, o, O and 1, l, I).
Gumbo
A: 

A simple way to do that would be to list all the symbols allowed in a URL that aren't alphanumeric — the ones I came up with with a quick Internet search are $-.+!'();/?:@=& — and just encode those somehow. My list has 17 symbols, and the simplest way to encode them without surrendering legibility that I can think of would be to pick one alphanumeric symbol, say s, to act as a shift code:

$ ⇒ s0    - ⇒ s1    _ ⇒ s2    . ⇒ s3    + ⇒ s4    ! ⇒ s5
* ⇒ s6    ' ⇒ s7    ( ⇒ s8    ) ⇒ s9    ; ⇒ sa    / ⇒ sb
? ⇒ sc    : ⇒ sd    @ ⇒ se    = ⇒ sf    & ⇒ sg    s ⇒ ss

Another approach would be to transform the original URL into a bitstream, preferably with some compression algorithm since you forfeited legibility already, and then assigning an alphanumeric symbol for each possible 6-bit sequence. Note that this leaves 4 alphanumeric symbols you never use — you could reclaim them if you really cared about length, but it hardly seems worth the complication.

I'll ignore the "crypto" word in the topic, since you don't seem all that interested in making the scheme difficult to uncover.

Pete
+2  A: 

I think I actually found a better solution (at least more suitable and easy to implement in my case)

It is somehow a hack which consist of unpackking the string with the H* parameter. Here is a sample of the code :

url =  "http://stackoverflow.com/questions/960658/crypto-in-ruby-and-alphanumeric"
unpacked = url.unpack("H*")  # => 687474703a2f2f737461636b6f766572666c6f772e636f6d2f7175657374696f6e732f3936303635382f63727970746f2d696e2d727562792d616e642d616c7068616e756d65726963
unpacked.pack("H*")  # => http://stackoverflow.com/questions/960658/crypto-in-ruby-and-alphanumeric

I will not mark this as the answer (not even sure I can...), but I'd like to let the readers know that it actually did the trick for me ;)

Julien Genestoux
A: 

As long as you don't mind ugly urls you could do a quick one with base64 and url escape:

require 'base64'
require 'cgi'
require 'uri'

def encode_url(url)
  CGI.escape(Base64.encode64(url))
end

And back again:

def decode_url(encoded_url)
  Base64.decode64(CGI.unescape(encoded_url))
end

Big ugly urls, but it would get the job done:

>> u = encode_url("http://railsruby.blogspot.com/2006/07/url-escape-and-url-unescape.html")
=> "aHR0cDovL3JhaWxzcnVieS5ibG9nc3BvdC5jb20vMjAwNi8wNy91cmwtZXNj%0AYXBlLWFuZC11cmwtdW5lc2NhcGUuaHRtbA%3D%3D%0A"
>> decode_url u
=> "http://railsruby.blogspot.com/2006/07/url-escape-and-url-unescape.html"
csexton