How do I take an efficient simple random sample in SQL? The database in question is running MySQL; my table is at least 200,000 rows, and I want a simple random sample of about 10,000.
The "obvious" answer is to:
SELECT * FROM table ORDER BY RAND() LIMIT 10000
For large tables, that's too slow: it calls RAND() for every row (which already puts it at O(n)), and sorts them, making it O(n lg n) at best. Is there a way to do this faster than O(n)?