ansaurus

Question

Optimising a SELECT query that runs slow on Oracle which runs quickly on SQL Server

Answer 1

+3 A:

The "IN" - clause is known in Oracle to be pretty slow. In fact, the internal query optimizer in Oracle cannot handle statements with "IN" pretty good. try using "EXISTS":

SELECT orderID FROM tasks WHERE orderID NOT EXISTS 
    (SELECT DISTINCT orderID FROM tasks WHERE
         engineer1 IS NOT NULL AND engineer2 IS NOT NULL)`print("code sample");`

Caution: Please check if the query builds the same data results.

Edith says: ooops, the query is not well formed, but the general idea is correct. Oracle has to fulfill a full table scan for the second (inner) query, build the results and then compare them to the first (outer) query, that's why it's slowing down. Try

SELECT orderID AS oid FROM tasks WHERE NOT EXISTS 
    (SELECT DISTINCT orderID AS oid2 FROM tasks WHERE
         engineer1 IS NOT NULL AND engineer2 IS NOT NULL and oid=oid2)

or something similiar ;-)

Georgi 2008-09-23 11:55:15

I came to the same query (see my answer below), except for : * the subquery has no reason to select DISTINCT orderIDs. * drop the "orderID" between WHERE and NOT EXISTS (syntax error). * drop the 'print("code sample"), obviously ;-)

Mac 2008-09-23 12:08:49

When I try the second query it gets an error?ORA-00904: "OID2": invalid identifier

RickL 2008-09-23 12:16:09

Use the "AS oid2" I inserted, it was not in the query before.

Georgi 2008-09-23 12:19:51

@Mac: Oooops, 'print("code sample")', hmmm... my only excuse is that I'm working on a new laptop and didn't see it :) - thanks for your hints!

Georgi 2008-09-23 12:20:53

Hmm, no, it still gives the same error... tried in SQL Plus and Toad Freeware.

RickL 2008-09-23 12:25:58

Uh, I don't have a Oracle on my laptop at the moment, but give this a try: Completely skip the "AS oid2" and end the query with "oid=orderID)" - if you cant fix it, eliminate it ;-)

Georgi 2008-09-23 12:39:20

"The "IN" - clause is known in Oracle to be pretty slow. In fact, the internal query optimizer in Oracle cannot handle statements with "IN" pretty good. try using "EXISTS":"Nonense. Where's the justification for this?

David Aldridge 2008-09-23 13:36:29

AJ gave the correct syntax for the NOT EXISTS version of the SQL statement, but it is still too slow (over 3 minutes).

RickL 2008-09-23 13:59:05

Georgi, where is your proof that IN is "slow" and the CBO can't handle it? If you're going to answer questions here, try to do so from a position of understanding, instead of one from hearsay and supposition

2008-10-24 21:49:46

Answer 2

A:

Isn't your query the same as

SELECT orderID FROM tasks
WHERE engineer1 IS NOT NULL OR engineer2 IS NOT NULL

?

ΤΖΩΤΖΙΟΥ 2008-09-23 11:59:50

No it's not. I made the same mistake :-)Each order has mutliple tasks, and if one of those tasks has an engineer assigned, the order counts as "allocated"

RB 2008-09-23 12:01:20

Answer 3

A:

How about :

SELECT DISTINCT orderID FROM tasks t1 WHERE NOT EXISTS (SELECT * FROM tasks t2 WHERE t2.orderID=t1.orderID AND (engineer1 IS NOT NULL OR engineer2 IS NOT NULL));

I am not a guru of optimization, but maybe you also overlooked some indexes in your Oracle database.

Mac 2008-09-23 12:02:18

I tried this but it was still running over a minute later when I cancelled it.

RickL 2008-09-23 12:10:46

Answer 4

A:

Another option is to use MINUS (EXCEPT on MSSQL)

SELECT orderID FROM tasks
MINUS
SELECT DISTINCT orderID FROM tasks WHERE engineer1 IS NOT NULL 
AND engineer2 IS NOT NULL

Vinko Vrsalovic 2008-09-23 12:05:22

I thought of that, too, but I think this query is not the fastest, too.

Georgi 2008-09-23 12:10:37

Yes, be careful, the minus statement uses alot of memory

Mike McAllister 2008-09-23 12:30:25

Normally the implicit distinct would be unecessary overhead, but in this case it would probably be appropriate. Two scans of the table might be inefficient though.

David Aldridge 2008-09-23 18:07:47

Answer 5

A:

Sub-queries are "bad" with Oracle. It's generally better do use joins.

Here's an article on how to rewrite your subqueries with join : http://www.dba-oracle.com/sql/t_rewrite_subqueries_performance.htm

Michel 2008-09-23 12:13:04

Nothing is necessary "bad". The difference between the benefits of different techniques almost always comes down to the value distributions of a particular data set and the presence or absence of indexes and constraints.

David Aldridge 2008-09-23 15:09:30

Answer 6

A:

Here is an alternate approach which I think gives what you want:

SELECT orderID
 FROM tasks
 GROUP BY orderID
 HAVING COUNT(engineer1) = 0 OR COUNT(engineer2) = 0

I'm not sure if you want "AND" or "OR" in the HAVING clause. It sounds like according to business logic these two fields should either both be populated or both be NULL; if this is guaranteed then you could reduce the condition to just checking engineer1.

Your original query would, I think, give multiple rows per orderID, whereas mine will only give one. I am guessing this is OK since you are only fetching the orderID.

Dave Costa 2008-09-23 12:16:07

Answer 7

+2 A:

Some questions:

How many rows are there in tasks?
What indexes are defined on it?
Has the table been analyzed recently?

Another way to write the same query would be:

select orderid from tasks
minus
select orderid from tasks
where engineer1 IS NOT NULL AND engineer2 IS NOT NULL

However, I would rather expect the query to involve an "orders" table:

select orderid from ORDERS
minus
select orderid from tasks
where engineer1 IS NOT NULL AND engineer2 IS NOT NULL

or

select orderid from ORDERS
where orderid not in
( select orderid from tasks
  where engineer1 IS NOT NULL AND engineer2 IS NOT NULL
)

or

select orderid from ORDERS
where not exists
( select null from tasks
  where tasks.orderid = orders.orderid
  and   engineer1 IS NOT NULL OR engineer2 IS NOT NULL
)

Tony Andrews 2008-09-23 12:17:57

Your second proposal is NOT the same thing, since there can be multiple tasks per orderID, some assigned and some not. He only wants orderIDs for which NONE of the tasks are assigned.

Dave Costa 2008-09-23 12:22:59

Your last proposal would need to have a correlated subquery (e.g. add "AND tasks.orderID = orders.orderID"). You can't simply change NOT IN to NOT EXISTS without modifying the subquery.

Dave Costa 2008-09-23 12:24:20

Both points corrected now

Tony Andrews 2008-09-23 12:56:46

Answer 8

A:

Your query:

SELECT orderID
FROM tasks 
WHERE orderID NOT IN (
  SELECT DISTINCT orderID
  FROM tasks
  WHERE engineer1 IS NOT NULL AND engineer2 IS NOT NULL
)
;

Is equivalent to:

SELECT DISTINCT orderID
FROM tasks 
WHERE NOT ( engineer1 IS NOT NULL AND engineer2 IS NOT NULL )
;

Which can be further simplified to:

SELECT DISTINCT orderID
FROM tasks 
WHERE engineer1 IS NULL OR engineer2 IS NULL
;

Maybe Oracle does not realize how to optimize this but SQL Server does?

Kip 2008-09-23 12:27:43

Not when there are multiple orderid's per task and they can have different statuses of engineer1 and engineer2 being null.

David Aldridge 2008-09-23 13:38:10

Oh you're right. I'm so used to the primary key being the only field with "ID" in its name that I assumed orderID was the primary key of this table.

Kip 2008-09-23 14:48:34

Answer 9

+2 A:

I agree with TZQTZIO, I don't get your query.

If we assume the query did make sense then you might want to try using EXISTS as some suggest and avoid IN. IN is not always bad and there are likely cases which one could show it actually performs better than EXISTS.

The question title is not very helpful. I could set this query up in one Oracle database and make it run slow and make it run fast in another. There are many factors that determine how the database resolves the query, object statistics, SYS schema statistics, and parameters, as well as server performance. Sqlserver vs. Oracle isn't the problem here.

For those interested in query tuning and performance and want to learn more some of the google terms to search are "oak table oracle" and "oracle jonathan lewis".

Ethan Post 2008-09-23 12:29:21

Answer 10

+2 A:

I would try using joins instead

SELECT 
    t.orderID 
FROM 
    tasks  t
    LEFT JOIN tasks t1
     ON t.orderID = t1.orderID
     AND t1.engineer1 IS NOT NULL 
     AND t1.engineer2 IS NOT NULL
WHERE
    t1.orderID IS NULL

also your original query would probably be easier to understand if it was specified as:

SELECT orderID FROM orders WHERE orderID NOT IN 
(SELECT DISTINCT orderID FROM tasks WHERE
 engineer1 IS NOT NULL AND engineer2 IS NOT NULL)

(assuming you have orders table with all the orders listed)

which can be then rewritten using joins as:

SELECT 
    o.orderID 
FROM 
    orders o
    LEFT JOIN tasks t
     ON o.orderID = t.orderID
     AND t.engineer1 IS NOT NULL 
     AND t.engineer2 IS NOT NULL
WHERE
    t.orderID IS NULL

kristof 2008-09-23 12:49:45

This doesn't work because it should only return orders where *all* tasks engineer fields are not null, but this will return orders where some tasks are engineer noll, and some tasks are engineer not null. Did you mean for the WHERE clause to be different?

RickL 2008-09-23 13:56:54

It think it returns the same result as the query that you specified: SELECT orderID FROM tasks WHERE orderID NOT IN (SELECT DISTINCT orderID FROM tasks WHERE engineer1 IS NOT NULL AND engineer2 IS NOT NULL)which I believe returns:All orders that do not have any tasks with both engineers assigned

kristof 2008-09-23 14:22:19

But perhaps I am missing something, if one assigned engineer is enough not to list the order then change the left join to read as: LEFT JOIN tasks t1 ON t.orderID = t1.orderID AND (t1.engineer1 IS NOT NULL OR t1.engineer2 IS NOT NULL) But that would be different from your original query.

kristof 2008-09-23 14:27:34

Answer 11

A:

If you have no index over the Engineer1 and Engineer2 columns then you are always going to generate a Table Scan in SQL Server and the equivalent whatever that may be in Oracle.

If you just need the Orders that have unallocated tasks then the following should work just fine on both platforms, but you should also consider adding the indexes to the Tasks table to improve query perfomance.

SELECT DISTINCT orderID 
FROM tasks 
WHERE (engineer1 IS NULL OR engineer2 IS NULL)

Jim Birchall 2008-09-23 12:51:35

Typically, the engineer fields will be foreign keys, so the indexes should be there.

ΤΖΩΤΖΙΟΥ 2008-09-23 19:36:34

@ΤΖΩΤΖΙΟΥ Thanks for the downvote, but did you read the question? "The tasks only has an index on the orderID and taskID"

Jim Birchall 2008-09-24 09:13:11

Answer 12

A:

If you decide to create an ORDERS table, I'd add an ALLOCATED flag to it, and create a bitmap index. This approach also forces you to modify the business logic to keep the flag updated, but the queries will be lightning fast. It depends on how critical are the queries for the application.

Regarding the answers, the simpler the better in this case. Forget subqueries, joins, distinct and group bys, they are not needed at all!

2008-09-23 13:20:56

Answer 13

+1 A:

I think several people have pretty much the right SQL, but are missing a join between the inner and outer queries.
Try this:

SELECT t1.orderID 
FROM   tasks t1
WHERE  NOT EXISTS
       (SELECT 1 
        FROM   tasks t2 
        WHERE  t2.orderID   = t1.orderID
        AND    t2.engineer1 IS NOT NULL 
        AND    t2.engineer2 IS NOT NULL)

AJ 2008-09-23 13:30:31

Thanks, I tried this and it is the correct syntax, but it was still running for over 3 minutes when I cancelled it.

RickL 2008-09-23 13:49:03

Answer 14

+6 A:

Often this type of problem goes away if you analyze the tables involved (so Oracle has a better idea of the distribution of the data)

ANALYZE TABLE tasks COMPUTE STATISTICS;

hamishmcn 2008-09-23 13:57:51

Incredible, after I ran this it took just under 1 second to run my original SQL statement.

RickL 2008-09-23 14:12:38

This is an obsolete syntax for collecting statistics. DBMS_STATS is a more robust way. http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#PFGRF30102

David Aldridge 2008-09-23 15:13:42

I agree, this is obsolete. Please try to avoid using it. Use DBMS_STATS and make sure you also get your indexes, you can set cascade=>true when you gather stats for the table.

Ethan Post 2008-09-23 16:51:08

Answer 15

A:

What proportion of the rows in the table meet the condition "engineer1 IS NOT NULL AND engineer2 IS NOT NULL"?

This tells you (roughly) whether it might be worth trying to use an index to retrieve the associated orderid's.

Another way to write the query in Oracle that would handle unindexed cases very well would be:

select distinct orderid
from
(
select orderid,
       max(case when engineer1 is null and engineer2 is null then 0 else 1)
          over (partition by orderid)
          as max_null_finder
from   tasks
)
where max_null_finder = 0

David Aldridge 2008-09-23 14:04:03

Answer 16

A:

The Oracle optimizer does a good job of processing MINUS statements. If you re-write your query using MINUS, it is likely to run quite quickly:

SELECT orderID FROM tasks
MINUS
SELECT DISTINCT orderID FROM tasks WHERE
 engineer1 IS NOT NULL AND engineer2 IS NOT NULL

JoshL 2008-09-23 14:08:40

Answer 17

A:

New take.

Iff:

The COUNT() function does not count NULL values

and

You want the orderID of all tasks where none of the tasks have either engineer1 or engineer2 set to a value

then this should do what you want:

SELECT orderID
FROM tasks
GROUP BY orderID
HAVING COUNT(engineer1) = 0 AND COUNT(engineer2) = 0

Please test it.

ΤΖΩΤΖΙΟΥ 2008-09-23 14:54:17

Answer 18

+1 A:

"Although I'm still curious why I have to do this, and if/when I would need to run it again?"

The statistics give Oracle's cost-based optimzer information that it needs to determine the efficiency of different execution plans: for example, the number of rowsin a table, the average width of rows, highest and lowest values per column, number of distinct values per column, clustering factor of indexes etc.

In a small database you can just setup a job to gather statistics every night and leave it alone. In fact, this is the default under 10g. For larger implementations you usually have to weigh the stability of the execution plans against the way that the data changes, which is a tricky balance.

Oracle also has a feature called "dynamic sampling" that is used to sample tables to determine relevant statistics at execution time. It's much more often used with data warehouses where the overhead of the sampling it outweighed by the potential performance increase for a long-running query.

David Aldridge 2008-09-23 15:21:05

Thanks, copied your answer into the question.

RickL 2008-09-25 15:23:19

Dave's on a Dynamic Sampling Binge

2008-10-24 21:41:14

I loves the dynamic sampling. I don't know it isn't spoken of more often.

David Aldridge 2008-10-28 18:37:30

@Dave, only if your queries are big and infrequent. Otherwise you spend more time sampling that doing.

WW 2009-09-27 05:57:21

Well it all bears measuring of course. DS impact depends on a lot of issues, and you can get information from a DS query that lasts a tiny fraction of a second that can take a query from 10 seconds down to a fraction of a second. For example dynamic sampling of a part of a query in which multiple predicates run against a small dimension table (public holidays on a Monday in a particular year? Women over 5'11''?) need not take long but can be very valuable. Long story short, it's just another tool to understand and appreciate for what it is.

David Aldridge 2009-09-27 11:18:31

Answer 19

A:

For your followup question, you may be interested in this:

http://stackoverflow.com/questions/79229/how-often-should-oracle-database-statistics-be-run

Dave Costa 2008-09-23 15:35:20

Answer 20

A:

Could you provide the explain plans for both queries, so we have some idea what the optimizer thinks about them both?

EvilTeach 2008-09-26 16:19:27

this isn't an answer, it's a comment. Put these in the comments to the question section.

2008-10-24 21:40:07

Answer 21

A:

I agree with ΤΖΩΤΖΙΟΥ and wearejimbo that your query should be...

SELECT DISTINCT orderID FROM Tasks 
WHERE Engineer1 IS NULL OR Engineer2 IS NULL;

I don't know about SQL Server, but this query won't be able to take advantage of any indexes because null rows aren't in indexes. The solution to this would be to re-write the query in a way that would allow a function based index to be created that only includes the null value rows. This could be done with NVL2, but would likely not be portable to SQL Server.

I think the best answer is not one that meets your criteria and that is write a different statement for each platform that is best for that platform.

Leigh Riffel 2008-10-12 17:56:08

SQL Server allows only one NULL row in an index except you create a filtered index. In this case it does not include NULLs.

usr 2010-02-01 02:19:14

Thanks for the info.

Leigh Riffel 2010-02-03 13:46:15

ansaurus

tags:

views:

answers:

Optimising a SELECT query that runs slow on Oracle which runs quickly on SQL Server

related questions