How do I take an efficient simple random sample in SQL? The database in question is running MySQL; my table is at least 200,000 rows, and I want a simple random sample of about 10,000.
The "obvious" answer is to:
SELECT * FROM table ORDER BY RAND() LIMIT 10000
For large tables, that's too slow: it calls RAND() for every row (which al...
I've been killing some time with a little puzzle project -- writing the algorithm described in Knuth for random sampling of a population without knowing the size of the population in advance. So far I've written it in JavaScript Rhino, Clojure, Groovy, Python, and Haskell. The basic parameters are:
the function takes two arguments, a s...
Recently I needed to do weighted random selection of elements from a list, both with and without replacement. While there are well known and good algorithms for unweighted selection, and some for weighted selection without replacement (such as modifications of the resevoir algorithm), I couldn't find any good algorithms for weighted sele...
I’m thinking through some database design concepts and believe that creating sample data simulating real-world volume of my application will help solidify some design decisions.
Does any anyone know of a tool to create sample data? I’m looking for something that’s database and platform neutral if possible (from MySQL to DB/2 and Wind...
Hi all,
I have a 10^7 lines file, in which I want to choose 1/100 of lines randomly
from the file. This is the AWK code I have, but it slurps all the file content
before hand. My PC memory cannot handle such slurps. Is there other approach to do it?
awk 'BEGIN{srand()}
!/^$/{ a[c++]=$0}
END {
for ( i=1;i<=c ;i++ ) {
num=int(r...
Following up on this question, I need to get exactly n lines at random out of a file (or stdin). This would be similar to head or tail, except I want some from the middle.
Now, other than looping over the file with the solutions to the linked question, what's the best way to get exactly n lines in one run?
For reference, I tried this:...
I found a perl script that manages randomizing the wikipedia articles in Wikipedia here. The code seems to be slightly computer generated. Due to my present interest in MySQL, I thought you could possibly have the links and related data in a database.
I know that MySQL is good in maintaining relations between tables, while it seems you ...
Hi SO,
This is a kind of newbie question, but I couldn't find a solution. I read a list of strings from a file, and try to get a random, 5 element sample with random.sample, but the resultung list only contains characters. Why is that? How can I get a random sample list of strings?
This is what I do:
names = random.sample( open('n...
Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array X and array Y, where the values of array X and array Y must be re-generated, re-ordered or transformed until they are correlated with each...
I've a method which returns a generic list collection(List) from the database. This collection has got order details i.e., Order Id, order name, product details etc.
Also, method the method returns a collection having only the top 5 orders sorted by order date descending.
My requirement is that each time the client calls this method, I...
I want to write a map/reduce job to select a number of random samples from a large dataset based on a row level condition. I want to minimize the number of intermediate keys.
Pseudocode:
for each row
if row matches condition
put the row.id in the bucket if the bucket is not already large enough
Have you done something like th...
Is there a random sampling function for the iphone?
For example, if you want to flip a coin that returns heads 25% of the times it's flipped, and you want to flip it and see if you get heads this time? I googled for random sampling probability for iphone and couldn't find anything.
...
I am working on some statistical code and exploring different ways of creating samples from random distributions - starting from a random number generator that generates uniform floating point values from 0 to 1
I know it is possible to generate approximate samples from a normal distribution by adding together a sufficiently large numbe...