views:

152

answers:

3

I faced a following problem: generate N unique alphanumeric strings from a restricted alphabet. Here's my solution in C#:

string Alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
Random generator = new Random();
const int ToGenerate = 10000;
const int CharactersCount = 4;
ArrayList generatedStrings = new ArrayList();
while( generatedStrings.Count < ToGenerate ) {
   string newString = "Prefix";
   for( int i = 0; i < CharactersCount; i++ ) {
      int index = generator.Next( Alphabet.Length );
      char character = Alphabet[index];
      newString += character;
   }
   if( !generatedStrings.Contains( newString ) ) {
      generatedStrings.Add( newString );
   }                
}
for( int i = 0; i < generatedStrings.Count; i++ ) {
    System.Console.Out.WriteLine( generatedStrings[i] );
}

it generates 10K strings starting with "Prefix" and otherwise consisting of capital letters and numbers. The output looks good.

Now I see the following problem. The produced strings are for a scenario where they should be unlikely to be predicted by anyone. In my program the seed is time-dependent. Once someone knows the seed value he can run the same code and get the exact same strings. If he knows any two strings he can easily figure out my algorithm (since it is really naive) and attempt to brute-force the seed value - just enumerate all possible seed values until he sees the two known strings in the output.

Is there some simple change that could be done to my code to make the described attack less possible?

+5  A: 

Well, it depends what you consider "simple".

You can "solve" your problem by using a "true" source of random numbers. You can try the free ones (random.org, fourmilab hotbits, etc), or buy one, depending on the sort of operation you're running.

Alternatively (and perhaps better) is to not generate in advance, and instead generate on demand. But this may be a significant change to your business process/model.

Noon Silk
Yeah, random.org and other similar services are a good option since it is much more truly random, random.org uses atmospheric noise, I believe, to get their results, so you could just use the API to this to preload a bunch of randomized data for use, combining this with basic programming "random" algorithms would, I think, yield very good results.
Rick
+5  A: 

Well, how would he know the seed? Unless he knew the exact time you ran the code, that is very hard to do. But if you need stronger, you can also create cryptographically strong random numbers via System.Security.Cryptography.RandomNumberGenerator.Create - something like:

        var rng = System.Security.Cryptography.RandomNumberGenerator.Create();
        byte[] buffer = new byte[4];
        char[] chars = new char[CharactersCount];
        for(int i = 0 ; i < chars.Length ; i++)
        {
            rng.GetBytes(buffer);
            int nxt = BitConverter.ToInt32(buffer, 0);
            int index = nxt % Alphabet.Length;
            if(index < 0) index += Alphabet.Length;
            chars[i] = Alphabet[index];
        }
        string s = new string(chars);
Marc Gravell
If you have a series of numbers all generated at a very close time, it might be possible to statistically determine the seed by the properties of the numbers. I think it's a wise concern.
Noon Silk
I suppose he can brute-force the seed passing different values into `Random` constuctor, can't he?
sharptooth
@sharptooth - but not with `RandomNumberGenerator` - see the demo code added
Marc Gravell
@sharptooth: Yes. You are right. Given that he could guess the year to within, say, +- 10 years, it leaves only 315,360,000 numbers to try. Which really isn't that many, if you know what you're looking for, as you suggest.
Noon Silk
Btw why BitConverter.ToUInt32() is not used?
sharptooth
@sharptooth - I guess that would be fine too.
Marc Gravell
Okay, this solution looks good. Thank you.
sharptooth
@LukeH: That's good point, thanks for noting this.
sharptooth
Marc: Indeed, it is not *at all* hard to work backwards from even a small sample of stuff generated by a pseudo-RNG to determine what the seed was and hence what all the other output was. The number of seeds is *tiny* - typically only a few billion, or even a few million.
Eric Lippert
+2  A: 

Use something which uses unpredictable amount of time in the middle and change the seed by time (different seed for every value). You could for example ping some servers or calculate something in a loop with a lot of random values or write something to disk etc.

Cloudanger
A good thought but actually this makes the problem *worse*, not *better*. Now the attacker knows that the algorithm is generating strings by the algorithm "give me the first number generated by this seed... now the next seed ... now the next seed". where the seed numbers are monotonically increasing and based on the current time. That *massively* cuts down on the number of possible generated strings, so an attacker can very easily generate strings that have similar properties, until one of them works.
Eric Lippert
You are missing the forest for the trees here. If your supposition is that you have a source of *entropy* - truly random timings - then don't use a pseudo-rng seeded from that entropy. *Use the entropy*. Of course, what is even better is *simply call the code that already does that*. The crypto random number generator takes entropy from processor timings, keystroke timings, and so on, and serves up that entropy in the form of random numbers.
Eric Lippert
@Eric Lippert1: It will take less time to brute-force the string (considering the OP's setup) than guessing the seed. So if the string is only **4 chars wide**, there's not much point in building a secure generator when bruteforcing it takes a fraction of the time. The point is the size of the value matters too. There will be only 1.5M combinations with the OP's setup anyway. I hope he's not using such small string in the final version. @Eric Lippert2: Very true.
Jaroslav Jandek