I'm reading in a lot of data from ldap which needs to be compared to the respective records in the database. To minimize the number of SQL queries, I want to batch multiple ldap records into a single query.
All this is pretty simple: A thread to produce ldap results, and a thread to consume those results and run the SQL query.
ldap_results = Queue.Queue(10) def producer(): for result in ldap_results(): ldap_results.put(result) def consumer(): buffer = [] buffer_size = 5 while True: record = ldap_results.get() buffer.append(record) if len(buffer) >= buffer_size: do_sql(buffer) buffer = []
The problem is: If ldap only returns, say, 3 results and buffer_size
is 5, it'll end up blocking forever. I realize I could put some special token into the buffer, like None
, or "EOF"
, but that seems like bad design: "iterate until you're done, oh, unless you see this special value, that means you're done, too".
I came up with two alternative ideas. The first is to have a shared eof
variable, but I don't know how to properly synchronize it.
def producer(): while data: buffer.put() eof = True def consumer(): while not eof: buffer.get()
The second is to have a ProduceChunks(chunk_size)
method for the producer, and it'll handle the batching up of results, but I don't like that because it assumes the producer will know how best to buffer up results, when, really, I think that is the responsibility of the consumer.
Does anyone have any guidance?