tags:

views:

155

answers:

4

Long speech short: How can I make hundreds of simultaneously running processes communicate with a database through one or few permanent sessions?

The whole story:
I once built a number crunching engine that handles vast amounts of large data files by forking off one child after another giving each a small number of files to work on. File locking, progress monitoring and result propagation happen in an Oracle database which all (sub-)processes access at various times using an application-specific module which encapsulates DBI.

This worked well at first, but now with higher volumes of input data, the number of database sessions (one per child, and they can be very short-lived) constantly being opened and closed is becoming an issue. I now want to centralise database access so that there are only one or few fixed database sessions which handle all database access for all the (sub-)processes. The presence of the database abstraction module should make the changes easy because the function calls in the worker instances can stay the same. My problem is that I cannot think of a suitable way to enhance said module in order to establish communication between all the processes and the database connector(s).

I thought of message queueing, but couldn't come up with a way of connecting a large herd of requestors with one or few database connectors in a way so that bidirectional communication is possible (for collecting the query result).
An asynchronous approach could help here in that all requests are written to the same queue and the database connector servicing the request will "call back" to submit the result. But my mind fails me in generating an image clear enough so that I can paint into code.
Threading instead of forking might have given me an easier start, but this would now require massive changes to the code base that I'm not prepared to do to a live system.

The more I think of it, the more the base idea looks like a pre-forked web server to me only that it doesn't serve web pages but database queries. Any ideas on what to dig into, and where? Sample (pseudo) code to inspire me, links to possibly related articles, ready solutions on CPAN maybe?

A: 

Have a look at DBD::Gofer. It's designed to be a separate process to pool and manage database connections.

mpeters
Thank you for the quick answer. For the first pages this reads very compelling, but then it says that transactions are not supported. I'd feel pretty unsafe without transaction support.
Olfan
+3  A: 

I think you should think of adding a tier. I think POE can handle the middle tier.

Axeman
Outsourcing the task to a standalone application didn't come to my mind yet, thank you for this idea. I've been routed on from CPAN to a whole wiki of documentation at http://poe.perl.org/, it'll take some time to find and consume what's helpful there.
Olfan
Oh boy, this takes me to a whole new world. I'm from a very procedural background and these multitasking concepts look pretty unfamiliar to me. ;-) It'll need some studying but I think it's actually the way I'll go. Thank you again for the answer.
Olfan
A: 

You might look at SQLRELAY, which has some other advantages as well on the proxy side of things.

http://sqlrelay.sourceforge.net/sqlrelay/

This looks very interesting, too. Unfortunately, I wouldn't be allowed to use it because it's not from an "approved source" (being CPAN and selfmade in my context).
Olfan
A: 

You may want to talk to your DBA about "Shared Server", which is quite easy to implement in Oracle >= 10. You can think of this like connection pooling on the server side. So when you ask for a connection, you are not necessarily creating a new dedicate server process, and destroying it when you connect.

You could also do connection pooling on your side.

jimbob
That concept reads interesting yet isn't meant for my purpose. We're talking legacy here, the Oracle in question is a 9.2.0.6. Even with a 10g, changing a server's parameters after the initial setup is incredibly hard to get through with.
Olfan