tags:

views:

151

answers:

2

Hey.

I have a text file with few hundred lines, the structure is pretty simple.

firstname lastname

I need to pick out a random firstname & listname from the file.

Thanks.

+6  A: 
string[] lines = File.ReadAllLines(...); //i hope that the file is not too big
Random rand = new Random();
return lines[rand.Next(lines.Length)];

Another (and maybe better) option is to have the first line of the file contain the number of records in it and then you don't have to read all the file.

Itay
+1: I hate it when people answer the questions I can actually answer before I can, especially when they use exactly the same code I would :)
Callum Rogers
Thanks, i just figured it out. But your way seems better.
crap
That works as long as the file is relatively small. I've provided an alternative that allow you to not have to keep the entire file in memory.
tvanfosson
@tvanfosson: The second version will work also if file is very large (by adding # of records in first line) and the advantage of it over you solution (which i really like from math perspective) is that on average only half of the file will be read (and only one line in memory each time).
Itay
+8  A: 

Read each line keeping a count, N, of the lines you have seen so far. Select each line with probability 1/N, i.e., the first line will always be chosen, the second line will be chosen 1/2 times to replace the first, the third 1/3 times, ... This way each line has a 1/N probability of being the selected line, you only have to read the file once, and you don't need to store all of the file in memory at any given time.

Here's an implementation that can be adapted for your needs.

public string RandomLine( StreamReader reader )
{
    string chosen = null;
    int numberSeen = 0;
    var rng = new Random();
    while ((string line = reader.ReadLine()) != null)
    {
        if (rng.NextInt(++numberSeen) == 0)
        {
            chosen = line;
        }
    }
    return chosen;
}

Based on a C implementation for selecting a node from an arbitrarily long linked list.

tvanfosson
the math looks wrong to me. Isn't the probability of selecting the 1st line 1/n! by the time you get to the end?
GregS
The math here is fine and nice :)
Itay
@Itay: only if you believe the comments: I don't. But, I've been fooled by probability before.
GregS
@Itay: Ahh, I think I am getting a clue now. In step n you *keep* the previous selection with probability (n-1)/n. And by induction, the probability of the previous selection was 1/(n-1), so multiplying through gives 1/n. Clever.
GregS
@GregS - use inductive reasoning. Assume that at the Nth step, each element has 1/n probability of being chosen. For N = 1, this is clear. For N = 2, the first was chosen at step 1 and has a 1/2 probability of being replaced at step 2 so our assumption holds. Now at step, n+1 each of the preceding elements, by our hypothesis, had 1/n chance of being selected randomly. With probability 1/n+1 we choose to replace it. Clearly the current element has a 1/n+1 probability of being selected, but that element was selected randomly so all preceding elements have a 1/n probability of being selected
tvanfosson
actually the probability of line n to be chosen is p(n) = (1/n)*(1 - sum where i > n of p(i))
Itay