views:

126

answers:

3

My program is ICAPServer (similar with httpserver), it's main job is to receive data from clients and save the data to DB.

There are two main steps and two threads:

  1. ICAPServer receives data from clients, puts the data in a queue (50kb <1ms);
  2. another thread pops data from the queue, and writes them to DB SO, if 2nd step is too slow, the queue will fill up memory with those data.

Wondering if anyone have any suggestion...

A: 

Put an upper limit on the amount of data in the queue?

Dingo
but 1st step will wait for free slot from the queue, and that, results in clients waiting...
Andy
+2  A: 

It is hard to say for sure, but perhaps using two processes instead of threads will help in this situation. Since Python has the Global Interpreter Lock (GIL), it has the effect of only allowing any one thread to execute Python instructions at any time.

Having a system designed around processes might have the following advantages:

  • Higher concurrency, especially on multiprocessor machines
  • Greater throughput, since you can probably spawn multiple queue consumers / DB writer processes to spread out the work. Although, the impact of this might be minimal if it is really the DB that is the bottleneck and not the process writing to the DB.
Doug
But what i put in the queue are objects which have file handle attribute , and file handle can not be shared between processes, so, it might be very hard to do that...
Andy
In that case, I guess you'd have to share paths between the processes instead of open file handles. Having the DB writer processes open and process the files might improve throughput even further.
Doug
tks for the suggestionBut, the file handles are created temporarily and dynamical, i don`t know how to pass the path of temporary file...and it might be complex...
Andy
i think i got an idea :Tracking the length of queue, if at max length, initial new threads, in order to balance performance of 2nd step (threading not processing), will that work?
Andy
BTW, the queue i mentioned above where i mean the Queue.Queue exactly
Andy
"Tracking the length of queue, if at max length, initial new threads, in order to balance performance of 2nd step (threading not processing), will that work?"It's worth a shot and should be easier to get working than moving to processes.
Doug
But, i know oracle supports parallel operation, how can i do that with sqlalchemy?
Andy
A: 

One note: before going for optimizations, it is very important to get some good measurement, and profiling.

That said, I would bet the slow part in the second step is database communication; you could try to analyze the SQL statement and its execution plan. and then optimize it (it is one of the features of SQLAlchemy); if still it would be too slow, check about database optimizations.

Of course, it is possible the bottleneck would be in a completely different place; in this case, you still have chances to optimize using C code, dedicated network, or more threads - just to give three possible example of completely different kind of optimizations.

Another point: as I/O operations usually release the GIL, you could also try to improve performance just by adding another reader thread - and I think this could be a much cheaper solution.

Roberto Liffredo
thanks a lot for the tips, Roberto!I don`t know whether i can use profile on the reactor of twisted, for it`s a circular function.What i did is making time stamps at each point, testing the time difference of them.And the time difference of each commit() is about 30ms(2nd step), while the 1st step is <1ms.I think that mean something to the performance--2nd step is too slow.
Andy
I have updated the answer; basically, I would also suggest you using more threads, because they are cheap to implement, and _if_ the bottleneck is in the network, then the GIL is released while waiting, so you could achieve some good performance improvement.
Roberto Liffredo
tks Roberto, i test your suggestion -- add more threads, each thread uses its own session to write DB, seems performance boosted.I know we can operate DB parallel, did multi-threading implement parallel? Or should i do something on sqlalchmey in order to implement parallel DB operation?
Andy