views:

652

answers:

2

A number of stored procedures I support query remote databases over a WAN. The network occasionally goes down, but the worst that ever happened was the procedures failed and would have to be restarted.

The last couple weeks it's taken a sinister turn. Instead of failing the procedures hang in a wierd locked state. They can't be killed inside of Oracle and as long as they exist any attempt to run other copies of the procedure will hang too. The only solution we've found is to kill the offending procedures with a "kill -9" from the OS. Some of these procedures haven't been changed for months, even years, so I suspect a root cause in the DB or DB configuration.

Any one have any ideas of what we can do to either fix the problem? Or does PL/SQL have a time-out mechanism I can add to the code so that I can create an exception that I can handle programatically?

+1  A: 

What database version ? Are they stuck running SQL or in PL/SQL ? Has anyone added exception handling into the routines recently ? I remember in 9iR2, we were told that, instead or raising an exception to the calling routine, we were to catch all exceptions and keep running (basically try to run process all the items in the job even if some fail). We inevitably had jobs get stuck in an infinite loop with SQLs failing, getting caught by the exception handler and trying again. And they couldn't be killed as the WHEN OTHERS also caught the 'your session has been killed' exception. I think the latter changed in 10g so that exception didn't get caught.

Gary
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64biPL/SQL stored procedures. I reviewed on of the procedures and I don't see any place for infinite looping. One interesting thing is that 4 days before this first happened the October CPU was installed. Might have caused it?
Jeff
Anyone have a link to the October CPU documentation? All I can find is the document listing all the different Oracle version that should install the CPU.
Jeff
A: 

We were never able to determine what caused this to happen. We believe it was a defect in the October 2008 cumulative patch. Perhaps a later patch as fixed it. It hasn't happened for a couple months (and we've had some network outages) so hopefully the problem has gone away.

Jeff