views:

339

answers:

8

I have a 400 line sql query which is throwing exception withing 30 seconds

ORA-03113: end-of-file on communication channel

Below are things to note:

  1. I have set the timeout as 10 mins
  2. There is one last condition when removed resolves this error.
  3. This error came only recently when I analyzed indexes.

The troubling condition is like this:

AND UPPER (someMultiJoin.someColumn) LIKE UPPER ('%90936%')

So my assumption is that the query is getting terminated from the server side apparently because its identified as a resource hog.

Is my assumption appropriate ? How should I go about to fix this problem ?

EDIT: I tried to get the explain plan of faulty query but the explain plan query also gives me an ORA-03113 error. I understand that my query is not very performant but why should that be a reason for ORA-03113 error. I am trying to run the query from toad and there are no alert log or trace generated, my db version is Oracle9i Enterprise Edition Release 9.2.0.7.0 - Production

A: 

It means you have been disconnected. This not likely to be due to being a resource hog.

I have seen where the connection to the DB is running over a NAT and because there is no traffic it closes the tunnel and thus drops the connection. Generally if you use connection pooling you won't get this.

Daniel
Thats shouldn't be the case as I have tried over and over and still getting the same error.
Ravi Gupta
A: 

As @Daniel said, the network connection to the server is being broken. You might take a look at End-of-file on communication channel to see if it offers any useful suggestions.

Share and enjoy.

Bob Jarvis
+3  A: 

One possible cause of this error is a thread crash on the server side. Check whether the Oracle server has generated any trace files, or logged any errors in its alert log.

You say that removing one condition from the query causes the problem to go away. How long does the query take to run without that condition? Have you checked the execution plans for both versions of the query to see if adding that condition is causing some inefficient plan to be chosen?

Dave Costa
Query takes around 40 secs with that condition knocked out. When I try to get the explain plan (with condition) I get the same ORA-03113 error.
Ravi Gupta
+1 - First check the trace and alert logs.
REW
A: 

This is often a bug in the Cost Based Optimizer with complex queries.

What you can try to do is to change the execution plan. E.g. use WITH to pull some subquerys out. Or use the SELECT /*+ RULE */ hint to prevent Oracle from using the CBO. Also dropping the statistics helps, because Oracle then uses another execution plan.

If you can update the database, make a test installation of 9.2.0.8 and see if the error is gone there.

Sometimes it helps to make a dump of the schema, drop everything in it and import the dump again.

andrem
A: 

have you tried without UPPER ?

ZXX
+1  A: 

From the information so far it looks like an back-end crash, as Dave Costa suggested some time ago. Were you able to check the server logs?

Can you get the plan with set autotrace traceonly explain? Does it happen from SQL*Plus locally, or only with a remote connection? Certainly sounds like an ORA-600 on the back-end could be the culprit, particularly if it's at parse time. The successful run taking longer than the failing one seems to rule out a network problem. I suspect it's failing quite quickly but the client is taking up to 30 seconds to give up on the dead connection, or the server is taking that long to write trace and core files.

Which probably leaves you the option of patching (if you can find a relevant fix for the specific ORA-600 on Metalink) or upgrading the DB; or rewriting the query to avoid it. You may get some ideas for how to do that from Metalink if it's a known bug. If you're lucky it might be as simple as a hint, if the extra condition is having an unexpected impact on the plan. Is someMultiJoin.someColumn part of an index that's used in the successful version? It's possible the UPPER is confusing it and you could persuade it back on to the successful plan by hinting it to use the index anyway, but that's obviously rather speculative.

Alex Poole
+1 - There are several bugs that can cause things like this.
REW
+1  A: 

You can safely remove the "UPPER" on both parts if you are using the like with numbers (that are not case sensitive), this can reduce the query time to check the like sentence

AND UPPER (someMultiJoin.someColumn) LIKE UPPER ('%90936%')

Is equals to:

AND someMultiJoin.someColumn LIKE '%90936%'

Numbers are not affected by UPPER (and % is independent of character casing).

Dubas
A: 

I've had similar connection dropping issues with certain variations on a query. In my case connections dropped when using rownum under certain circumstances. It turned out to be a bug that had a workaround by adjusting a certain Oracle Database configuration setting. We went with a workaround until a patch could be installed. I wish I could remember more specifics or find an old email on this but I don't know that the specifics would help address your issue. I'm posting this just to say that you've probably encountered a bug and if you have access to Oracle's support site (support.oracle.com) you'll likely find that others have reported it.

Edit: I had a quick look at Oracle support. There are more than 1000 bugs related to ORA-03113 but I found one that may apply:

Bug 5015257: QUERY FAILS WITH ORA-3113 AND COREDUMP WHEN QUERY_REWRITE_ENABLED='TRUE'

To summarize:

  • Identified in 9.2.0.6.0 and fixed in 10.2.0.1
  • Running a particular query (not identified) causes ORA-03113
  • Running explain on query does the same
  • There is a core file in $ORACLE_HOME/dbs
  • Workaround is to set QUERY_REWRITE_ENABLED to false: alter system set query_rewrite_enabled = FALSE;

Another possibility:

Bug 3659827: ORA-3113 FROM LONG RUNNING QUERY

  • 9.2.0.5.0 through 10.2.0.0
  • Problem: Customer has long running query that consistently produces ORA-3113 errros.
    On customers system they receive core.log files but do not receive any errors in the alert.log. On test system I used I receivded ORA-7445 errors.
  • Workaround: set "_complex_view_merging"=false at session level or instance level.
jlpp