views:

1320

answers:

6

I read some of the discussion in this question and thought to myself that in my PL/SQL code I have "exists" style queries all over the place that don't use the ROWNUM=1 optimisation.

The questions I have are:

  1. Does the introduction of ROWNUM=1 significantly increase performance?
  2. If so, under what conditions would performance be particularly improved (e.g lots of joins, constraints on unindexed columns, large tables, large result sets)

I'm trying to determine of it is worth rewriting all of my existing queries to add a ROWNUM=1 optimisation.

The queries I'm thinking of are ones that may have multiple joins and may query large tables. They have the general form of:

SELECT 1
INTO ln_count
FROM table_1, table_2...., table_n
WHERE <various joins and conditions>;

IF ln_count > 0 THEN
  <do stuff>
END IF;

I'm considering changing them to:

SELECT 1
INTO ln_count
FROM table_1, table_2...., table_n
WHERE <various joins and conditions>
AND ROWNUM = 1;

IF <local variable> > 0 THEN
  <do stuff>
END IF;
+2  A: 

I'm going to guess that this is not going to be worth your time. Modern optimizers are very good at what they do so I'd find it a bit amazing that a query that only is ALLOWED to return one row is going to see a significant performance boost from adding ROWNUM=1.

Is the performance gain supposedly from removing the need to check for this constraint?

I find when I stop trusting the optimizer I often dig myself a deeper grave ;)

Additionally: When in doubt try it out. Find a large join, run it several times without rownum=1, several times with rownum=1 and see if you're noticing a large percentage improvement. To ensure that there are no caching issues I would suggest doing this on a database that you are able to restart.

Brian
We all spend our first 2 or 3 years with SQL learning there are no clever new ideas.
le dorfier
The queries are "allowed" to return one row but anything after the first is ignored since I only care if there are any results.One practical example from my code. I have a large table containing customer orders. Each order might have a status (e.g "Pending", "Open", "Closed", "Cancelled") and a product identifier. I have a query to find out of there are any open orders for a given product identifier. I know that, in theory, there is a performance gain - what I'm interested in is whether there is a significant performance gain in practice.
darreljnz
Disagree. There is a difference between a query that is "allowed to return one row" (e.g. one that computes an aggregate) and one that only examines one row (e.g. one with ROWNUM=1). Compare: SELECT COUNT(*) FROM my_big_table; vs. SELECT COUNT(*) FROM my_big_table WHERE ROWNUM=1; - the result (in terms of performance) will be very different.
Jeffrey Kemp
+4  A: 

One rule of thumb in optimization is to not do it unless you have a hotspot you need to fix. However, if you are curious about the performance benefits, you might want to run some tests using both to see if you can measure any improved performance.

wikipedia quotes Donald Knuth as saying:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."

akf
+1, Good advice and nice quote
DCookie
A: 

If you ask for COUNT(1) then Oracle must find all the matching rows to satisfy your exact answer.

SELECT COUNT(1) FROM ....

If you ask for 1 from the first row, Oracle can stop once it has found one matching row.

SELECT 1 FROM ... WHERE ROWNUM = 1

It is good practice to only ask for the data that you actually need. Why get Oracle to tell you there are 1,203,499 matching results when you only care about the first one? People have mentioned that the optimizer can improve things. However, it still has to answer the question you asked. If you ask an easier question it can come up with the answer quicker.

The times it is likely to have a significant impact on performance:- * The actual count of records you find is high, * Oracle switches from a HASH JOIN plan to a NESTED LOOP, and the nested loop plan is better for finding the first row

WW
COUNT(1) doesn't mean that it will only get one row. SELECT COUNT(*) FROM my_big_table; is exactly the same as SELECT COUNT(1) FROM my_big_table;.
Jeffrey Kemp
I wasn't making a point about COUNT(*) vs COUNT(1). If you ask for COUNT(1) then it needs to count them all. If you ask for 1 for the first matching row it only needs to find one.
WW
Fair enough. Personally I avoid using "COUNT(1)" to avoid confusion for newbies.
Jeffrey Kemp
A: 

Does the introduction of ROWNUM=1 significantly increase performance?

It can make a very significant difference. If you are only interested in the first row the database happens to find when it runs the query, then it is best practice to tell Oracle that, by adding "ROWNUM=1". If you don't, Oracle will assume you intend to fetch all the rows from the query eventually, and will optimise the query accordingly.

In the case of COUNT(), if you only want to know if there is at least one record, the query optimiser won't know that and will count each and every row - a waste of time. If you add ROWNUM=1, you have given the optimiser the chance to stop as soon as it finds a row.

If so, under what conditions would performance be particularly improved (e.g lots of joins, constraints on unindexed columns, large tables, large result sets)

The more data the query needs to plow through to answer your query, the more the ROWNUM=1 predicate is likely to improve performance. In a multi-table join, for example, adding ROWNUM=1 could change the plan from using a lot of expensive hash joins, to make it use nested loops instead, which are far quicker when rows exist.

The opposite can be true, however - if you expect that normally there will be zero rows, this can make things worse - because it has to query the whole table before it can confidently assert "there are no matching rows". Of course, appropriate indexes normally help this situation greatly.

Jeffrey Kemp
Bottom line: adding ROWNUM=1 is NOT just an "optimization" measure - it should be considered an important part of writing each query to do exactly what it should do, and no more.
Jeffrey Kemp
Wow, my first downvote! :)
Jeffrey Kemp
anyone care to elaborate on what's wrong here? just for my edification...
Jeffrey Kemp
Beats me Jeffrey; you were saying the same thing as my answer.
WW
+3  A: 

It does improve performance significantly (tens of percent average) on queries which cannnot be solved by simple single index lookup e.g. table joins. However it has a potential to hide data/application error.

Lets have a table:

create table t (id number(10,0), padding varchar2(1000));

--intentionally don't use PK to make the example as simple as possible. The padding is used to simulate real data load in each record

with a many records:

insert into t (id, padding)
select rownum, rpad(' ', 1000) from dual connect by level < 10000

Now if you ask something like

select 1 into ll_exists
from t where id = 5;

the DB must go through the whole table whether it found the only matching record in the first data block (which by the way we cannot know because it could be inserted by many different ways) or in the last. That's because it doesn't know that there is only one matching record. On the other hand if you use ... and rownum = 1 than it can stop traversing through data after the record is found because you told it that there is not (or not needed) another matching record.

The drawback is that with the rownum constraint you may get undeterministic results if the data contains more than one possible record. If the query was

select id into ll_id
from t where mod (id, 2) = 1
and rownum = 1;

then I may receive from the DB answer 1 as well as 3 as well as 123 ... order is not guaranteed and this is the consequence. (without the rownum clause I would get a TOO_MANY_ROWS exception. It depends on situation which one is worse)

If you really want query which tests existence then WRITE IT THAT WAY.

begin

select 'It does' 
  into ls_exists
from dual where
exists (your_original_query_without_rownum);

do_something_when_it_does_exist
exception
  when no_data_found then
    do_something_when_it_doesn't_exist
end;
Michal Pravda
+1, good detailed response + notes regarding the non-deterministic nature of rownum predicates.
Jeffrey Kemp
I like the conclusion at the end - that is a more appropriate way of writing it.
darreljnz
The outer query can be select count(*) from dual ... (it'll be wither 1 or 0), so you won't need to use exception block to do program logic (which is bad).
Gabor Kecskemeti
@Gabor: Misuse of exceptions for program logic is bad. But in this case they are used in exactly the fashion they were introduced for - handling an (un)common case of nonexisting row. Using count when you are not interested in the amount, as you suggested, is bad in my opinion.
Michal Pravda
+1  A: 

While I like the highest voted answer, in an effort to avoid exceptions, I do something like this:

begin

  select count(*)
    into ls_exists
    from dual
    where exists (select null from ... where ...);

  if ls_exists = 1 then
    do_something;
  else
    do_something_else;
  end if;

end;

This will always return a single row with a 1 or a 0. No exceptions.

Note also the use of SELECT NULL. This does a couple of things; when the optimizer thinks it's appropriate, Oracle will only look at indexes. If you put table columns in the SELECT clause, and the columns are not part of an index, Oracle will do the index lookup, then get the row from the table, which may be totally useless depending on the query. Since you're only checking the existence of a row, you probably don't need actual data.

SELECT NULL doesn't return any data (an optimization), and the EXISTS clause looks at whether a row is returned, not the data in the row.

djb