ansaurus

Question

Answer 1

+5 A:

Are you certain that the uuid module would in fact be too slow to handle the requests you expect in a timely manner? I would be very surprised if UUID generation accounted for a bottleneck in your application.

I would first build the application to simply use the uuid module and then if you find that this module is in fact slowing things down you should investigate a way to keep a pre-generated list of UUIDs around.

Andrew Hare 2010-04-08 22:47:56

+1! Absolutely! This whole question SCREAMS "premature optimization". Even if popping UUIDs from a set is 100x faster than generating them on the fly, there is little point in adding this kind of complexity if it only accounts for 2% of the overal program. What happens if you multithread this code? Now you have to synchronize access between the UUID-generating thread, and the multiple UUID-consuming threads. This is an interesting idea, but PLEASE hold of on pursuing it until you get the program running, and only then discover that UUID generation is a performance-limiting item.

Paul McGuire 2010-04-09 03:13:42

+1. Imagine the disaster that would ensue if something went wrong, and you gave the same GUID twice.

Chris Thornton 2010-04-09 20:22:13

Answer 2

+4 A:

I have tested the performance of the uuid module for generating uuids:

>>> import timeit
>>> timer=timeit.Timer('uuid.uuid1()','import uuid')
>>> timer.repeat(3, 10000)
[0.84600019454956055, 0.8469998836517334, 0.84400010108947754]

How many do you need? Is 10000 per second not enough?

Mark Byers 2010-04-08 22:51:46

Now try>>> timer=timeit.Timer('uuids.pop()','import uuid; uuids = [str(uuid.uuid1()) for _ in xrange(10000)]')and you will see that it is over 100 times faster.

rjuiaa1 2010-04-08 23:07:58

@rjuiaa, I found popping from a `set` was faster than popping from a `list` which is why I used `set` in my answer

gnibbler 2010-04-09 00:39:53

I'm surprised that sets are faster here than lists. What about on the creation side? Sets first have to guarantee uniqueness, which is sort of a waste in this case, since we can be fairly certain that all the UUIDs will be unique. Is the underlying list data storage the cause? Perhaps a deque might address some of the issues with list storage, without incurring the unnecessary uniqueness checking penalty of using a set.

Paul McGuire 2010-04-09 03:28:47

Answer 3

A:

Suppose you have a thread to keep topping up a pool of uuid's.

Here is a very simple version

import uuid,threading,time

class UUID_Pool(threading.Thread):
    pool_size=10000
    def __init__(self):
        super(UUID_Pool,self).__init__()
        self.daemon=True
        self.uuid_pool=set(uuid.uuid1() for x in range(self.pool_size))

    def run(self):
        while True:
            while len(self.uuid_pool) < self.pool_size:
                self.uuid_pool.add(uuid.uuid1())
            time.sleep(0.01)              # top up the pool 100 times/sec

uuid_pool = UUID_Pool()
uuid_pool.start()
get_uuid = uuid_pool.uuid_pool.pop        # make a local binding
uuid=get_uuid()                           # ~60x faster than uuid.uuid1() on my computer

You'd also need to handle the case where your burst empties the pool by using uuid's faster than the thread can generate them.

gnibbler 2010-04-08 23:19:14

Is there no need to synchronize access to the pool across threads? Or are `add` and `pop` guaranteed threadsafe?

Paul McGuire 2010-04-09 03:35:39

Also, your `while True` forever loop contain some invariants that get recalculated each time. Every attribute access incurs a lookup, which in this algorithm does not change from loop to loop. Before starting the loop, assign locals for `self.uuid_pool`, `self.uuid_pool.add` and `uuid.uuid1` - even assigning `len` to a local will run faster, to access a local vs a global. Is this microoptimization? Of course it is. But if you go with the idea that UUID generation is something expensive, why leave out these easy optimization steps?

Paul McGuire 2010-04-09 03:35:56

@Paul McGuire, Yes `add` and `pop` are threadsafe. The part that needs to be optimised is accessing the uuids, I assume that there is ample time to fill the pool between bursts. Having a local like `get_uid = uuid_pool.uuid_pool.pop` might be a good idea though.

gnibbler 2010-04-09 07:52:15

ansaurus

tags:

views:

answers:

Pre-generating GUIDs for use in python?

related questions