Notifying consumer when producer is done

I'm reading in a lot of data from ldap which needs to be compared to the respective records in the database. To minimize the number of SQL queries, I want to batch multiple ldap records into a single query.

All this is pretty simple: A thread to produce ldap results, and a thread to consume those results and run the SQL query.

ldap_results = Queue.Queue(10)
def producer():
  for result in ldap_results():
    ldap_results.put(result)

def consumer():
  buffer = []
  buffer_size = 5
  while True:
    record = ldap_results.get()
    buffer.append(record)
    if len(buffer) >= buffer_size:
      do_sql(buffer)
      buffer = []

The problem is: If ldap only returns, say, 3 results and buffer_size is 5, it'll end up blocking forever. I realize I could put some special token into the buffer, like None, or "EOF", but that seems like bad design: "iterate until you're done, oh, unless you see this special value, that means you're done, too".

I came up with two alternative ideas. The first is to have a shared eof variable, but I don't know how to properly synchronize it.

def producer():
  while data:
    buffer.put()
  eof = True

def consumer():
  while not eof:
    buffer.get()

The second is to have a ProduceChunks(chunk_size) method for the producer, and it'll handle the batching up of results, but I don't like that because it assumes the producer will know how best to buffer up results, when, really, I think that is the responsibility of the consumer.

Does anyone have any guidance?

ansaurus

tags:

views:

answers:

Notifying consumer when producer is done

related questions