tags:

views:

27

answers:

3

Hi, this is my first question here. I have been evaluating myselft in designing and analysing different architectures. While doing this, I came across http://tinyurl.com/ site.

Could anyone throw their ideas on what could be the design consideration for this site and how architecture look like? Which algorithm will you use to generate tiny ID so that it can be remembered easily then long URLs?

Thanks.

+1  A: 

@Kbrimington has the idea - most tinyurls or bit.ly links aren't about memorability but are more about shortness.

All you really need to do is keep track of the short links you generate and map them to the longer links that are submitted. A database works well, of course.

Beyond that, I know that some of those websites gather statistics on the link that was generated like how many times that link had been visited. It makes people who post a link feel special when it gets viewed a lot of times, I suppose.

The general idea for the site is very simple, but you can add a lot of interesting features. I'd suggest starting off with basic shortening functionality and adding features later on.

Edit - Another feature some of the url shorteners have is the ability for the user to select the shorter url themselves, if it's available. For example, I might want to choose tinyurl.com/overflow if I wanted to provide a link to something on SO. It's memorable to the user and it doesn't involve any sort of algorithm for "memorable" url generation.

Dave McClelland
A: 

I agree with other answers here that it is about shortness, not memorability.

A question that would have to be answered in design: Do you want these links to last forever? is.gd, for example, operates on the principle that a link lasts forever. This means that anyone using the service now can no longer get urls as short as when I started using it -- they've run out of short urls less than five characters long. This has the benefit that if you come across a link a few years from now, it points to the same url (which may or may not be the same page). Personally, I use services like this generally because I want to share a link, not save it, so I'd prefer to re-use URLs.

is.gd also creates a new short url for an address every time somebody asks for it -- it doesn't check to see if there is already a short url for this address. I would guess this boosts performance, but again, at the expense of using up short URLs faster than necessary.

Statistics -- you can see how many times the url has been used. You could conceivably track other statistics, too... user agent strings, IPs, etc. Worth it?

Custom short URLs? Allowing people to pick URLs helps them be more memorable, but the "good" URLs will be gone quickly. If you don't also give at least an option for a random/next-in-line URL generation, you're forcing your users to come up with something when many probably want a fast, short URL and to move on. If you do custom URLs, at least give an option for random. And don't expect many choice URLs to be left after a week of heavy use.

Peter Leppert
A: 

Thanks Dave and Peter but whatever discussion happened here so far is really kind of good things to have and I do understand that. The problem is which algorithm you will use to generate tinyURL and how you will store it in database? How it will be retrieved and user will be redirected to original site. I could see the performance impact definately here because when user types tinyurl, it will contact tinyurl server first and then redirect to original server.

I could think of two ways to generate tiny URL from original URL.

If we go for using hashtable and fixed tinyURL of 6 char long with [a-z, 0-9, and _ chars], we will end with having 37^6 tiny different URLs. It's also possible that hash collision happens in this process. How to handle it? Is it good to rehash or use chaining?

I also thought of using @@Identity value that keep increasing till MAX value supported by SQL server, convert the last+1 value to hexa decimal, and return it as a tiny URL. Do we have any issue with this approach? Yes, when it crosses 8 or 10 digits, it might possible that tinyURL will be complecated than original URL.

Could anyone think of mixing these approaches or have any other idea that can best handle tiny URL generation?

Thanks.