views:

252

answers:

3

I have a python program that needs to generate several guids and hand them back with some other data to a client over the network. It may be hit with a lot of requests in a short time period and I would like the latency to be as low as reasonably possible.

Ideally, rather than generating new guids on the fly as the client waits for a response, I would rather be bulk-generating a list of guids in the background that is continually replenished so that I always have pre-generated ones ready to hand out.

I am using the uuid module in python on linux. I understand that this is using the uuidd daemon to get uuids. Does uuidd already take care of pre-genreating uuids so that it always has some ready? From the documentation it appears that it does not.

Is there some setting in python or with uuidd to get it to do this automatically? Is there a more elegant approach then manually creating a background thread in my program that maintains a list of uuids?

+5  A: 

Are you certain that the uuid module would in fact be too slow to handle the requests you expect in a timely manner? I would be very surprised if UUID generation accounted for a bottleneck in your application.

I would first build the application to simply use the uuid module and then if you find that this module is in fact slowing things down you should investigate a way to keep a pre-generated list of UUIDs around.

Andrew Hare
+1! Absolutely! This whole question SCREAMS "premature optimization". Even if popping UUIDs from a set is 100x faster than generating them on the fly, there is little point in adding this kind of complexity if it only accounts for 2% of the overal program. What happens if you multithread this code? Now you have to synchronize access between the UUID-generating thread, and the multiple UUID-consuming threads. This is an interesting idea, but PLEASE hold of on pursuing it until you get the program running, and only then discover that UUID generation is a performance-limiting item.
Paul McGuire
+1. Imagine the disaster that would ensue if something went wrong, and you gave the same GUID twice.
Chris Thornton
+4  A: 

I have tested the performance of the uuid module for generating uuids:

>>> import timeit
>>> timer=timeit.Timer('uuid.uuid1()','import uuid')
>>> timer.repeat(3, 10000)
[0.84600019454956055, 0.8469998836517334, 0.84400010108947754]

How many do you need? Is 10000 per second not enough?

Mark Byers
Now try>>> timer=timeit.Timer('uuids.pop()','import uuid; uuids = [str(uuid.uuid1()) for _ in xrange(10000)]')and you will see that it is over 100 times faster.
rjuiaa1
@rjuiaa, I found popping from a `set` was faster than popping from a `list` which is why I used `set` in my answer
gnibbler
I'm surprised that sets are faster here than lists. What about on the creation side? Sets first have to guarantee uniqueness, which is sort of a waste in this case, since we can be fairly certain that all the UUIDs will be unique. Is the underlying list data storage the cause? Perhaps a deque might address some of the issues with list storage, without incurring the unnecessary uniqueness checking penalty of using a set.
Paul McGuire
A: 

Suppose you have a thread to keep topping up a pool of uuid's.

Here is a very simple version

import uuid,threading,time

class UUID_Pool(threading.Thread):
    pool_size=10000
    def __init__(self):
        super(UUID_Pool,self).__init__()
        self.daemon=True
        self.uuid_pool=set(uuid.uuid1() for x in range(self.pool_size))

    def run(self):
        while True:
            while len(self.uuid_pool) < self.pool_size:
                self.uuid_pool.add(uuid.uuid1())
            time.sleep(0.01)              # top up the pool 100 times/sec

uuid_pool = UUID_Pool()
uuid_pool.start()
get_uuid = uuid_pool.uuid_pool.pop        # make a local binding
uuid=get_uuid()                           # ~60x faster than uuid.uuid1() on my computer

You'd also need to handle the case where your burst empties the pool by using uuid's faster than the thread can generate them.

gnibbler
Is there no need to synchronize access to the pool across threads? Or are `add` and `pop` guaranteed threadsafe?
Paul McGuire
Also, your `while True` forever loop contain some invariants that get recalculated each time. Every attribute access incurs a lookup, which in this algorithm does not change from loop to loop. Before starting the loop, assign locals for `self.uuid_pool`, `self.uuid_pool.add` and `uuid.uuid1` - even assigning `len` to a local will run faster, to access a local vs a global. Is this microoptimization? Of course it is. But if you go with the idea that UUID generation is something expensive, why leave out these easy optimization steps?
Paul McGuire
@Paul McGuire, Yes `add` and `pop` are threadsafe. The part that needs to be optimised is accessing the uuids, I assume that there is ample time to fill the pool between bursts. Having a local like `get_uid = uuid_pool.uuid_pool.pop` might be a good idea though.
gnibbler