random-sample

Simple Random Samples from a (My)Sql database

How do I take an efficient simple random sample in SQL? The database in question is running MySQL; my table is at least 200,000 rows, and I want a simple random sample of about 10,000. The "obvious" answer is to: SELECT * FROM table ORDER BY RAND() LIMIT 10000 For large tables, that's too slow: it calls RAND() for every row (which al...

Rosetta Stone: reservoir random sampling algorithm

I've been killing some time with a little puzzle project -- writing the algorithm described in Knuth for random sampling of a population without knowing the size of the population in advance. So far I've written it in JavaScript Rhino, Clojure, Groovy, Python, and Haskell. The basic parameters are: the function takes two arguments, a s...

Weighted random selection with and without replacement

Recently I needed to do weighted random selection of elements from a list, both with and without replacement. While there are well known and good algorithms for unweighted selection, and some for weighted selection without replacement (such as modifications of the resevoir algorithm), I couldn't find any good algorithms for weighted sele...

Sample Data Creation Tool (mainly for Databases)

I’m thinking through some database design concepts and believe that creating sample data simulating real-world volume of my application will help solidify some design decisions. Does any anyone know of a tool to create sample data? I’m looking for something that’s database and platform neutral if possible (from MySQL to DB/2 and Wind...

Randomly Pick Lines From a File Without Slurping It With Unix

Hi all, I have a 10^7 lines file, in which I want to choose 1/100 of lines randomly from the file. This is the AWK code I have, but it slurps all the file content before hand. My PC memory cannot handle such slurps. Is there other approach to do it? awk 'BEGIN{srand()} !/^$/{ a[c++]=$0} END { for ( i=1;i<=c ;i++ ) { num=int(r...

How can I get exactly n random lines from a file with Perl?

Following up on this question, I need to get exactly n lines at random out of a file (or stdin). This would be similar to head or tail, except I want some from the middle. Now, other than looping over the file with the solutions to the linked question, what's the best way to get exactly n lines in one run? For reference, I tried this:...

Randomizing pages in Wikipedia with MySQL and Perl?

I found a perl script that manages randomizing the wikipedia articles in Wikipedia here. The code seems to be slightly computer generated. Due to my present interest in MySQL, I thought you could possibly have the links and related data in a database. I know that MySQL is good in maintaining relations between tables, while it seems you ...

random.sample return only characters instead of strings

Hi SO, This is a kind of newbie question, but I couldn't find a solution. I read a list of strings from a file, and try to get a random, 5 element sample with random.sample, but the resultung list only contains characters. Why is that? How can I get a random sample list of strings? This is what I do: names = random.sample( open('n...

Generating correlated numbers

Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array X and array Y, where the values of array X and array Y must be re-generated, re-ordered or transformed until they are correlated with each...

Returning items randomly from a collection

I've a method which returns a generic list collection(List) from the database. This collection has got order details i.e., Order Id, order name, product details etc. Also, method the method returns a collection having only the top 5 orders sorted by order date descending. My requirement is that each time the client calls this method, I...

How to pick random (small) data samples using Map/Reduce?

I want to write a map/reduce job to select a number of random samples from a large dataset based on a row level condition. I want to minimize the number of intermediate keys. Pseudocode: for each row if row matches condition put the row.id in the bucket if the bucket is not already large enough Have you done something like th...

Is there a random sampling function for iphone development that can simulate a coin toss?

Is there a random sampling function for the iphone? For example, if you want to flip a coin that returns heads 25% of the times it's flipped, and you want to flip it and see if you get heads this time? I googled for random sampling probability for iphone and couldn't find anything. ...

Generating samples from the logistic distribution

I am working on some statistical code and exploring different ways of creating samples from random distributions - starting from a random number generator that generates uniform floating point values from 0 to 1 I know it is possible to generate approximate samples from a normal distribution by adding together a sufficiently large numbe...