ansaurus

Question

Most Efficient Way to... Unique Random String

Answer 1

+1 A:

You could generate a GUID and only use the first 5 characters?

Mark Redman 2009-10-03 14:45:53

That's just another way of generating a random string, you still have to check for duplicates.

Guffa 2009-10-03 14:51:29

Was my first thought too, although he'll have to generate 5 additional bits for a case sensitive string.

schnaader 2009-10-03 14:51:44

Answer 2

+1 A:

Is randomness more important, or is uniqueness more important? -- note that I said "more" important; I get the fact that you need both.

If randomness is more important, then you're going to need some way to track historical values. The database itself (with an appropriate index) is going to be the best way to do this.

If uniqueness is more important, then simply use a counter and zero-pad it to five digits. This will, of course, limit you to 100,000 rows, so you could alternatively use a counter and a transformation into character space (eg, 1 = "A", 2 = "B", 27 = "AA", and so on).

kdgregory 2009-10-03 14:46:33

Answer 3

+8 A:

Create a table with a big pool of 5-character strings that are added in sequence (so they are unique), and have a GUID as their primary key. Add a column to indicate whether they are used or not.

When you need a new number, you select top 1 from the pool, order by the guid (so it becomes random), and set the result as "spent".

Console 2009-10-03 14:52:05

This creates an additional table, but will be unique, random and use the most possible values without having to continually search the current values. The OP's original solution will take longer and longer as the number of rows increase.

Edward Leno 2009-10-03 15:02:29

So I'm assuming that there will be a significant amount of work up front in generating the initial random strings.

rockinthesixstring 2009-10-03 15:11:45

Instead of adding a column to indicate whether they are used, why not just delete them as they are used? Makes the query faster and easier to write.

JohnFx 2009-10-03 15:12:17

I will be continually using them... they never expire.

rockinthesixstring 2009-10-03 15:13:04

For a base 54 random string the table size is 459 million and all your doing is punting the same problem from one table to another unless a predictable sequence is used which is then not "random".

Einstein 2009-10-03 16:14:05

FYI. My final solution is to create a unique db field (specified here: http://stackoverflow.com/questions/1514253/sql-server-2008-unique-column-that-is-case-sensitive) and then write a simple console app that inserts a generated random string via a "try catch" block. If it fails the try... it just loops to a new insert. I'm storing the values in a separate table and then will attach to them via a JOIN referencing the ID (int) field.

rockinthesixstring 2009-10-03 19:27:42

@Einstein: You don't need to pre-generate all the possible strings, just enough so that it seems like the numbers are random. You can have the procedure that "picks" numbers generate new ones as needed. (Obviously, a sequence of 5 characters is only random in a very limited sense of the word. As you point out, if you have 459 million of them you know which ones they are).

Console 2009-10-03 23:30:19

Answer 4

+1 A:

There is a method to pick unused unique words by random, but it's probably not going to be any better than what you are doing now.

The principle is that you determine which permutations of the words that are unused, generate a random number based on how many unused permuations there are, and pick that one.

If you for example would use a word with three characters, and only the character 0 and 1, there are eight possible permutations. If you already used the combinations "010" and "100", you would get something that looks like this:

PI = permutation index
UI = unused permutation index

No. PI UI
----------
000 0  0
001 1  1
010 2  -
011 3  2
100 4  -
101 5  3
110 6  4
111 7  5

To pick an unused permutation, you simply generate a random number from 0 to 5, and pick the corresponding permutation.

Keeping a list of all possible permuations is of course not practical, so you would need a function that can determine the permutation index from the string, and one function that can determine the string from the permutation index.

Also, to determine which permutations are unused, you have to check which are used, so you still have to query the table at some point.

Guffa 2009-10-03 15:03:12

Answer 5

A:

If you are inserting the string to an existing, populated, table then you will always need to check if the string doesn't exist there (it doesn't have to be an explicit SELECT). You can either to it manually, or have an UNIQUE constraint on the column and let the database to do it. So if the database returns an error because the string is already there, generate another one.

Note that if you have an empty table and want to populate it with multiple random strings, it's a different problem.

Lukáš Lalinský 2009-10-03 15:04:38

Answer 6

A:

I think you should stick to your origional idea. Putting a unique constraint on the index and letting the database check/report dupes for you would be fairly effecient method of dupe checking but this assumption depends on some information not provided like the number of rows and likelyhood of encountering dupes with randomly selected data.

Fully pre-populating a unique sequence pool with your parameters requires a 459 million row table.

You could use a bloom filter to load managable statistics into a database or main memory and avoid dupes but depending on the row count and filter configuration this may lead to saturation of the filter when the row count is an appreciable percentage of the 459 million limit.. Since the filter can report false positives you should work to ensure that you don't get into a situation where your system is stuck trying permutations that pass the filter approaching forever.

Einstein 2009-10-03 16:12:06

Answer 7

A:

As you know how long your word shall be, why not a tree-based approach? (Let's call it randomised tree walk)

Say your word has n characters. Generate a list of all symbols s in S and relate a counter for each symbol and possible position in the string, essentially a matrix M of dimensions s times n. Now roll your dice and choose the first letter and lookup M(s,1). If M(s,1) is larger or equal the number of possible words starting with s, roll again. Otherwise increment M(s,1).

Repeat this for every letter 1 till n.

Should be pretty fast until you have used up to many words.

Don Johe 2010-07-13 11:32:55

ansaurus

tags:

views:

answers:

Most Efficient Way to... Unique Random String

related questions