views:

519

answers:

2

I have a small project I am doing in Python using web.py. It's a name generator, using 4 "parts" of a name (firstname, middlename, anothername, surname). Each part of the name is a collection of entites in a MySQL databse (name_part (id, part, type_id), and name_part_type (id, description)). Basic stuff, I guess.

My generator picks a random entry of each "type", and assembles a comical name. Right now, I am using select * from name_part where type_id=[something] order by rand() limit 1 to select a random entry of each type (so I also have 4 queries that run per pageview, I figured this was better than one fat query returning potentially hundreds of rows; if you have a suggestion for how to pull this off in one query w/o a sproc I'll listen).

Obviously I want to make this more random. Actually, I want to give it better coverage, not necessarily randomness. I want to make sure it's using as many possibilities as possible. That's what I am asking in this question, what sorts of strategies can I use to give coverage over a large random sample?

My idea, is to implement a counter column on each name_part, and increment it each time I use it. I would need some logic to then say like: "get a name_part that is less than the highest "counter" for this "name_part_type", unless there are none then pick a random one". I am not very good at SQL, is this kind of logic even possible? The only way I can think to do this would require up to 3 or 4 queries for each part of the name (so up to 12 queries per pageview).

Can I get some input on my logic here? Am I overthinking it? This actually sounds ideal for a stored procedure... but can you guys at least help me solve how to do it without a sproc? (I don't know if I can even use a sproc with the built-in database stuff of web.py).

I hope this isn't terribly dumb but thanks ahead of time.

edit: Aside from my specific problem I am still curious if there are any alternate strategies I can use that may be better.

+4  A: 

I think what you're after is:

select * from name_part
    where type_id=[something]
    order by used_count asc, rand()
    limit 1

This will put the lesser used names at the top of the list and, if there's multiples with the same (lowest) used_count, they'll sort randomly.

paxdiablo
Oh duh. Wow man thanks superb.
goldenratio
Or if you want a little more randomness than that, ORDER BY number_of_uses+RAND()*3 (or some other biasing constant).
bobince
+1  A: 

I agree with your intuition that using a stored procedure is the right way to go, but then, I almost always try to implement database stuff in the database.

In your proc, I would introduce some kind of logic like say, there's only a 30% chance that returning the result will actually increment the counter. Just to increase the variability.

overslacked
That sounds like a good idea. My next step is to learn an ORM like SqlAlchemy, which supports sprocs (right?), and I will implement this then. For now I can still implement the randomness of increasing the counter in python. Thanks!
goldenratio