views:

1259

answers:

11

I'm implementing a URL shortening feature in my application in order to provide my users shorter alternative URLs that can be used in Twitter. The point is to be independent from the shortening services that offer this same service and include it as a feature of my web app.

What's the best way to create an unique random sequence of characters of about 6 chars? I plan to use that as an index for the items in my database that will have the alternative URLs.

Edited:

This feature will be used in a job board website, where every new job ad will get a custom URL with the title plus the shorter one to be used in Twitter. That said, the total number of unique 6 char combinations will be more than enough for a long time.

+2  A: 

The simplest way to make unique sequences is to do this sequentially, ie: aaaaaa aaaaab aaaaac ... These aren't necessarily the prettiest, but will guarantee uniqueness for the first 12230590463 sequences (provided you used a-z and A-Z as unique characters). If you need more URLs than that, you'd need to add a seventh char.

They aren't random sequences, though. If you make random ones, just pick a random char of the 48, 6 times. You'll need to check your existing DB for "used" sequences, though, as you'll be more likely to get collisions.

Reed Copsey
It does cause more work, and possible timeouts the more values you end up using.
IPX Ares
Yes. Sequential avoids all of that, but you end up with sequential values (which may or may not be fine). For a twitter URL, I'd just use seq. values, since it really doesn't matter how "random" they truly are, and you'd have a huge sequence of unique values, with dead-simple generation
Reed Copsey
If you include 0-9 as part of the character set, then you have 56800235584 unique values (~4.6x just a-z,A-Z). I know he just mentioned "characters" but that could include digits. If you also include some other characters such as *$%(), etc, that would also bump up a bit more (at least ~9x the original set).
Erich Mirabal
A: 

Thinking about this more here is an idea.

You can start with a key table, incrementing chars AAAAAA - ZZZZZZ.

Then do a random select from that table each time you insert a new URL, and delete from the available keys.

Thoughts?

For random select try this link

Select a random row with MySQL:

SELECT column FROM table
ORDER BY RAND()
LIMIT 1
Select a random row with PostgreSQL:

SELECT column FROM table
ORDER BY RANDOM()
LIMIT 1
Select a random row with Microsoft SQL Server:

SELECT TOP 1 column FROM table
ORDER BY NEWID()
Select a random row with IBM DB2

SELECT column, RAND() as IDX 
FROM table 
ORDER BY IDX FETCH FIRST 1 ROWS ONLY
Thanks Tim

Select a random record with Oracle:

SELECT column FROM
( SELECT column FROM table
ORDER BY dbms_random.value )
WHERE rownum = 1
IPX Ares
+2  A: 

I would use an autonumber system, and create an algorithm to generate the keys. ie 1 = a, 2 = b, 27 = aa etc.

You can use the database autonumber to guarantee that your URL is unique, and you can calculate the URL possibly in a sproc in the DB or in your business layer?

Additionally you can now index on the incrementing number which is cheap and DB's are optimised for these to be used and hashed as primary/foreign keys as opposed to a variable length random string.

Spence
+1  A: 

The usefulness of a random generator is limited to preventing users from plugging random URLs in to find things they shouldn't have a link to. If this is not your goal then sequential IDs should work just fine. If you just don't want to give users the impression that they are using "infant" technology (when they see that their job ad is #000001), why not start the sequence at some arbitrary value?

Tullo
A: 

I'd say hash it! http://www.codinghorror.com/blog/archives/000935.html

DmitryK
A: 

Instead of keeping a table of all possible values, just keep a table of the values you've used. Use the random function to generate 6 random values, 1 to 26, make the string from that and save it in an array or table. If it already exists, you can (a) generate another string, or (b) move through the table to the next available (missing) 6-letter string and use that value. (b) will be more efficient as the table fills.

xpda
A: 

Do you really need 'random', or would 'unique' be sufficient?

Unique is extremely simple - just insert the URL into a database, and convert the sequential id for that record to a base-n number which is represented by your chosen characterset.

For example, if you want to only use [A-Z] in your sequence, you convert the id of the record to a base 26 number, where A=1, B=2,... Z=26. The algothithm is a recursive div26/mod26, where the quotient is the required character and the remainder is used to calculate the next character.

Then when retrieving URL, you perform the inverse function, which is to convert the base-26 number back to decimal. Perform SELECT URL WHERE ID = decimal, and you're done!

EDIT:

private string alphabet = "abcdefghijklmnopqrstuvwxyz"; 
   // or whatever you want.  Include more characters 
   // for more combinations and shorter URLs

public string Encode(int databaseId)
{
    string encodedValue = String.Empty;

    while (databaseId > encodingBase)
    {
        int remainder;
        encodedValue += alphabet[Math.DivRem(databaseId, alphabet.Length, 
            out remainder)-1].ToString();
        databaseId = remainder;
    }
    return encodedValue;
}

public int Decode(string code)
{
    int returnValue;

    for (int thisPosition = 0; thisPosition < code.Length; thisPosition++)
    {
        char thisCharacter = code[thisPosition];

        returnValue += alphabet.IndexOf(thisCharacter) * 
            Math.Pow(alphabet.Length, code.Length - thisPosition - 1);
    }
    return returnValue;
}
Kirk Broadhurst
A: 

Following the idea of Reed Copsey's answer, i present the following code:

class IDGetter
{
    private StringID ID = new StringID();
    public string GetCurrentID()
    {
     string retStr = "";
     if (ID.char1 > 51)
      id.char1 = 0;
     if (ID.char2 > 51)
      id.char2 = 0;
     if (ID.char3 > 51)
      id.char3 = 0;
     if (ID.char4 > 51)
      id.char4 = 0;
     if (ID.char5 > 51)
      id.char5 = 0;
     if (ID.char6 > 51)
      throw new Exception("the maximum number of id's has been reached");
     return ToIDChar(ID.char1) + ToIDChar(ID.char2) + ToIDChar(ID.char3) + ToIDChar(ID.char4) + ToIDChar(ID.char5) + ToIDChar(ID.char6)
     id.char1++;
    }
    public void SetCurrentID(StringID id) //for setting the current ID from storage or resetting it or something
    {
     this.ID = id;
    }
    private const string alphabet = "abcdefghijklmopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    private static string ToIDChar(int number)
    {
     if (number > 51 || number < 0)
     {
      throw new InvalidArgumentException("the number passed in (" + number + ") must be between the range 0-51");
     }
     return alphabet[number];
    }
}
public struct StringID 
{
    public int char1 = 0;
    public int char2 = 0;
    public int char3 = 0;
    public int char4 = 0;
    public int char5 = 0;
    public int char6 = 0;
}

You might want to come up with a method of storing the current ID but that ought to work.

RCIX
+1  A: 

When you state "total number of unique 6 char combinations will be more than enough for a long time" for your random generation have you factored the birthday paradox into your calculations? This is generally the bane of any attempt to create random IDs within a range that is only 1 order of magnitude or less then the expected range that will be needed.

To create truly random IDs, you would need to create a loop that generates a new random value, checks to see if that value has already been used, and then repeats the loop if needed. The birthday paradox means that you quickly get to the point where many of the values generated are already in use (despite only a fraction of the total range being consumed), which causes the program to get slower and slower over time until it is taking thousands of attempts (and database lookups) to generate each ID.

I would suggest you go with the idea of encoding sequential IDs. To avoid the problem of users being able to simply increment/decrement the value in the URL to "explore", you can use a combination bit shifting and an alternate ordered list of letters (instead of 1=a, 2=b use 1=t, 2=j, etc).

David
A: 

I used this to do something very similar. I was not to worried about the speed of it as it was going to be a rarely used event and table. But it's possible to then increase the string as needed.

/// Generates a string and checks for existance
/// <returns>Non-existant string as ID</returns>
public static string GetRandomNumbers(int numChars, string Type)
{
    string result = string.Empty;
    bool isUnique = false;
    while (!isUnique)
    {
        //Build the string
        result = MakeID(numChars);
        //Check if unsued
        isUnique = GetValueExists(result, Type);
    }
    return result;
}
/// Builds the string
 public static string MakeID(int numChars)
{
    string random = string.Empty;
    string[] chars = { "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z" };
    Random rnd = new Random();
    for (int i = 0; i < numChars; i++)
    {
        random += chars[rnd.Next(0, 35)];
    }
    return random;
}
/// Checks database tables based on type for existance, if exists then retry
/// <returns>true or false</returns>
private static bool GetValueExists(string value, string Type)
{
    bool result = false;
    string sql = "";
    if (Type == "URL")
    {
        sql = string.Format(@"IF EXISTS (SELECT COUNT(1) FROM myTable WHERE uniqueString = '{0}')
         BEGIN
             SELECT 1
         END
          ELSE
          BEGIN
             SELECT 0
         END ", value);
    }
    //query the DB to see if it's in use
    result = //ExecuteSQL
    return result;
}
Tim Meers
A: 

You could try making a stored proc that generates a GUID, then truncate it to use just the last X characters. After it's generated and chopped, the proc can check it against the database to make sure it's not already taken.

ajh1138