views:

302

answers:

2

I have a datastore with around 1,000,000 entities in a model. I want to fetch 10 random entities from this.

I am not sure how to do this? can someone help?

+7  A: 

Assign each entity a random number and store it in the entity. Then query for ten records whose random number is greater than (or less than) some other random number.

This isn't totally random, however, since entities with nearby random numbers will tend to show up together. If you want to beat this, do ten queries based around ten random numbers, but this will be less efficient.

Jason Hall
Exactly right. Might want to mention the range (0..1 is standard) for the random numbers.
Nick Johnson
One possibility to increase randomness without hurting read-time efficiency would be to enqueue a task to assign new random numbers to the entities you fetched, so if you hit one of them again you won't get the same neighbors with it.
Wooble
+1  A: 

Here's another not-so-random approach:
1. fetch some records, and
2. pull a random sample from records fetched.

import random
N = 3
TO_FETCH = 10
results = MyModel.all().fetch(TO_FETCH)
results = random.sample(results, N)
self.response.out.write(template.render(path, {'results': results,}))
Adam Bernier
+1 That's a pretty good approach too, and one that you can combine with my solution as well: fetch 30 random results, then pick 10 random results from that set.
Jason Hall
Won't this be biased towards whatever records the datastore tends to fetch first?
Peter Recore
@Peter: you're exactly right. Thanks for pointing that out. As Jason mentions, this approach would probably be best combined with his idea of assigning random numbers to your entities.
Adam Bernier