views:

399

answers:

9

Every solution I come up with is not thread save.

def uuid(cls,db):
    u = hexlify(os.urandom(8)).decode('ascii')
    db.execute('SELECT sid FROM sessions WHERE sid=?',(u,))
    if db.fetch(): u=cls.uuid(db)
    else: db.execute('INSERT INTO sessions (sid) VALUES (?)',(u,))
    return u
+1  A: 

No need to call the database I'd think:

>>> import uuid

# make a UUID based on the host ID and current time
>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')

From this page.

Andomar
Is it random enough so people can not predict uuid's
Gert Cuykens
Should be random enough for 99% of applications, but I don't think it's cryptographically secure
Andomar
UUID is designed to be globally unique (which is not required here), but not unpredictable (while this is mandatory for sessions).
Denis Otkidach
A: 

If you absolutely need to verify uid against database and avoid race conditions, use transactions:

BEGIN TRANSACTION
SELECT COUNT(*) FROM sessions WHERE sid=%s
INSERT INTO sessions (sid,...) VALUES (%s,...)
COMMIT
yk4ever
SELECT will not create locks so this will work same way that if select is before the begin trans.
j.a.estevan
Nope. Imagine what happens when both threads have simultaneously executed the SELECT for same sid and received False. You will have two inserts one after another.Wrapping single statement in transaction doesn't make any sense.
yk4ever
+1  A: 

I'd start with a thread-unique ID and (somehow) concatenate that with a thread-local counter, then feed it through a cryptographic hash algorithm.

Vatine
+1 for the hashing technic, although there are modules to generate GUID.
Matthieu M.
What about multiple processes? (WSGIDaemonProcess processes=2 threads=5) Impossible to create a unique thread id?I think thread lock is the only solution?
Gert Cuykens
You use another piece of state (unique process ID) and concatenate that with the thread-unique ID and a thread-local counter, then hash. Only works as long as you're in "not huge number of threads and processes" territory, as you probably do not want to burst 64 bits for your counter, but that'd give you, say, a thread-local counter that's 32 bits, then 16 bits for process ID and another 16 for thread ID, so well within the space of reasonable.
Vatine
+2  A: 

If you require thread safety why not put you random number generator a function that uses a shared lock:

import threading
lock = threading.Lock()
def get_random_number(lock)
    with lock:
        print "This can only be done by one thread at a time"

If all the threads calling get_random_number use the same lock instance, then only one of them at time can create a random number.

Of course you have also just created a bottle neck in your application with this solution. There are other solutions depending on your requirements such as creating blocks of unique identifiers then consuming them in parallel.

Tendayi Mawushe
Will 2 separate WSGIDaemonProcesses each having there own lock, still be able to excecute the same select ?
Gert Cuykens
+2  A: 

Your algorithm is OK (thread safe as far as your DB API module is safe) and probably is the best way to go. It will never give you duplicate (assuming you have PRIMARY or UNIQUE key on sid), but you have a neglectfully small chance to get IntegrityError exception on INSERT. But your code doesn't look good. It's better to use a loop with limited number of attempts instead of recursion (which in case of some error in the code could become infinite):

for i in range(MAX_ATTEMPTS):
    sid = os.urandom(8).decode('hex')
    db.execute('SELECT COUNT(*) FROM sessions WHERE sid=?', (sid,))
    if not db.fetchone()[0]:
        # You can catch IntegrityError here and continue, but there are reasons
        # to avoid this.
        db.execute('INSERT INTO sessions (sid) VALUES (?)', (sid,))
        break
else:
    raise RuntimeError('Failed to generate unique session ID')

You can raise the number of random characters read used to make chance to fail even smaller. base64.urlsafe_b64encode() is your friend if you'd like to make SID shorter, but then you have to insure your database uses case-sensitive comparison for this columns (MySQL's VARCHAR is not suitable unless you set binary collation for it, but VARBINARY is OK).

Denis Otkidach
+4  A: 
import os, threading, Queue

def idmaker(aqueue):
  while True:
    u = hexlify(os.urandom(8)).decode('ascii')
    aqueue.put(u)

idqueue = Queue.Queue(2)

t = threading.Thread(target=idmaker, args=(idqueue,))
t.daemon = True
t.start()

def idgetter():
  return idqueue.get()

Queue is often the best way to synchronize threads in Python -- that's frequent enough that when designing a multi-thread system your first thought should be "how could I best do this with Queues". The underlying idea is to dedicate a thread to entirely "own" a shared resource or subsystem, and have all other "worker" threads access the resource only by gets and/or puts on Queues used by that dedicated thread (Queue is intrinsically threadsafe).

Here, we make an idqueue with a length of only 2 (we don't want the id generation to go wild, making a lot of ids beforehand, which wastes memory and exhausts the entropy pool -- not sure if 2 is optimal, but the sweet spot is definitely going to be a pretty small integer;-), so the id generator thread will block when trying to add the third one, and wait until some space opens in the queue. idgetter (which could also be simply defined by a top-level assignment, idgetter = idqueue.get) will normally find an id already there and waiting (and make space for the next one!) -- if not, it intrinsically blocks and waits, waking up as soon as the id generator has placed a new id in the queue.

Alex Martelli
Is idqueue.get() accessible by all WSGIDaemonProcesses ?
Gert Cuykens
@Gert, if you have multiple processes instead of threads you can do the same thing with a `multiprocessing.Queue` instead of a `Queue.Queue` (the latter's meant for threading, not for multiprocessing).
Alex Martelli
Mod wsgi is a multiprocessing and multithreading wsgi server, how do I use both ?
Gert Cuykens
multiprocessing.Queue works with threads as well as processes (it's just overkill if all you have is threads). Not sure how wsgi daemon processes are started, i.e. what data structures they can share, or IOW how they can best arrange to talk to each other.
Alex Martelli
@Alex: What problem do you solve here? Do you really think `os.urandom()` is unsafe in threaded/multiprocess environment?
Denis Otkidach
@Gert, sorry, code in comments is free-flowed and so totally unreadable. If you want your code to be readable, you'll have to edit your question to add it there with proper formatting.
Alex Martelli
http://stackoverflow.com/questions/1698670/mod-wsgi-from-multiprocessing-import-queue
Gert Cuykens
A: 

Is there not a unique piece of data in each thread? It is difficult for me to imagine two threads with exactly the same data. Though I don't discount the possibility.

In the past when I have done things of this nature there is usually something unique about the thread. User name or client name or something of that nature. The solution for me was to concatenate the UserName, for example, and the current time in milliseconds then hash that string and get a hex digest of the hash. This gives one a nice string that is always the same length.

There is a really remote possibility that two different John Smith's (or whatever) in two threads generate the id in the same millisecond. If that possibility makes one nervous then the locking route as mentioned may be needed.

As was already mentioned there are already routines to get a GUID. I personally like fiddling with hash functions so I have rolled my own in the way mentioned on large multi threaded systems with success.

It is ultimately up to you to decide if you really have threads with duplicate data. Be sure to choose a good hashing algorithm. I have used md5 successfully but I have read that it is possible to generate a hash collision with md5 though I have never done it. Lately I have been using sha1.

tmikew
+3  A: 

I'm suggesting just a small modification to the accepted answer by Denis:

for i in range(MAX_ATTEMPTS):
    sid = os.urandom(8).decode('hex')
    try:
        db.execute('INSERT INTO sessions (sid) VALUES (?)', (sid,))
    except IntegrityError:
        continue
    break
else:
    raise RuntimeError('Failed to generate unique session ID')

We simply attempt the insert without explicitly checking for the generated ID. The insert will very rarely fail, so we most often only have to make the one database call, instead of two.

This will improve efficiency by making fewer database calls, without compromising thread-safety (as this will effectively be handled by the database engine).

edvald
This solution will cause problems with transactions in bad cases.
Denis Otkidach
Can you explain what exactly need to be happening for this to fail pleas? What need the other thread be doing while this thread is about to have a exception?
Gert Cuykens
I can't really see in which case this solution will have more problems, but I may be missing something. Can you elaborate?Actually I figure neither solution would be entirely appropriate with transactions. For transactions I would suggest having the database engine generate the UID. See for example http://dev.mysql.com/doc/refman/5.1/en/miscellaneous-functions.html#function_uuid
edvald
sorry sqlite3 not possible to generate.
Gert Cuykens
A: 

mkdtemp should be thread-safe,simple and secure :

def uuid():
    import tempfile,os
    _tdir = tempfile.mkdtemp(prefix='uuid_')
    _uuid = os.path.basename(_tdir)
    os.rmdir(_tdir)
    return _uuid
Luca
Would that not be too much disk usage for session id's?
Gert Cuykens
No, see os.rmdir before return. Someone must do the work.. it's just the kernel OS in my example.Any other implementation like DB etc will add layers on top of this.Anyway I was thinking about session management where a directory is needed.
Luca
How do you know if _uuid is unique when you remove it?
Gert Cuykens
Removing directory could cause collisions? Of course this need investigation and could be OS/glibc related. You could leave the directory and delete when the uuid is not neededanymore, for example session logout/timeout .Someone must save the data.. it's just the OS file system in my example.As you can see I'm always thinking to sessions, my bad ;-)
Luca