views:

4186

answers:

10

I have read a book whose title is "Oracle PL SQL Programming" (2nd ed.) by Steven Feuerstein & Bill Pribyl. On page 99, there is a point suggested that

Do not "SELECT COUNT(*)" from a table unless you really need to know the total number of "hits." If you only need to know whether there is more than one match, simply fetch twice with an explicit cursor.

Could you anyone explain this point more to me by providing example? Thank you.

+1  A: 

He means open a cursor and fetch not only the first record but the second, and then you will know there is more than one.

Since I never seem to need to know that SELECT COUNT(*) is >= 2, I have no idea why this is a useful idiom in any SQL variant. Either no records or at least one, sure, but not two or more. And anyway, there's always EXISTS.

That, and the fact that Oracle's optimizer seems to be pretty poor... - I would question the relevance of the technique.

To address TheSoftwareJedi's comments:

WITH CustomersWith2OrMoreOrders AS (
    SELECT CustomerID
    FROM Orders
    GROUP BY CustomerID
    HAVING COUNT(*) >= 2
)
SELECT Customer.*
FROM Customer
INNER JOIN CustomersWith2OrMoreOrders
    ON Customer.CustomerID = CustomersWith2OrMoreOrders.CustomerID

Appropriately indexed, I've never had performance problems even with whole universe queries like this in SQL Server. However, I have consistently run into comments about Oracle optimizer problems here and on other sites.

My own experience with Oracle has not been good.

The comment from the OP appears to be saying that full COUNT(*) from tables are not well handled by the optimizer. i.e.:

IF EXISTS (SELECT COUNT(*) FROM table_name HAVING COUNT(*) >= 2)
BEGIN
END

(which, when a primary key exists, can be reduced to a simple index scan - in a case of extreme optimization, one can simply query the index metadata in sysindexes.rowcnt - to find the number of entries - all without a cursor) is to be generally avoided in favor of:

DECLARE CURSOR c IS SELECT something FROM table_name;
BEGIN
    OPEN c
    FETCH c INTO etc. x 2 and count rows and handle exceptions
END;

IF rc >= 2 THEN BEGIN
END

That, to me would result in less readable, less portable, and less maintainable code.

Cade Roux
"...the fact that Oracle's optimizer seems to be pretty poor..." - needless claim. at least back it up with facts or explain plans!
TheSoftwareJedi
How would you write a query for "confirm this customer has 2 or more orders" in your database of choice, and how would it be optimized?
TheSoftwareJedi
"The Problem" has nothing to do with the CBO. -- Bug lists for every DB are long. Humans naturally obsess over findings that support their opinions. Science has to fight this bias regularly. The Oracle bug you found defines Oracle as bad? I believe that the bug supported your opinion not caused it.
Actually, I had a high opinion of Oracle until I used it and had to troubleshoot problems - it was the opposite with SQL Server.
Cade Roux
+3  A: 

This comes from programmers writing code similar to the following (this is psuedo code!).

You want to check to see if the customer has more than one order:

if ((select count(*) from orders where customerid = :customerid) > 1)
{
    ....
}

That is a terribly inefficient way to do things. As Mark Brady would say, if you want to know if a jar contains pennies, would you count all the pennies in the jar, or just make sure there is 1 (or 2 in your example)?

This could be better written as:

if ((select 1 from (select 1 from orders where customerid = :customerid) where rownum = 2) == 1)
{
    ....
}

This prevents the "counting all of the coins" dilemma since Oracle will fetch 2 rows, then finish. The previous example would cause oracle to scan (an index or table) for ALL rows, then finish.

TheSoftwareJedi
That's pretty smart!
DanSingerman
I think you mean "rownum <= 2", right?
Dave Costa
A: 

Depending on the DB, there may be a sys table which stores an approximate count and can be queried in constant time. Useful if you want to know whether the table has 20 rows or 20,000 or 20,000,000.

Mike Kale
VERY VERY dangerous. Issue a delete of every row then look at the table, it's still shows 20 or 20,000 until you gather stats.
+5  A: 

If two is all you are interested in, try

SELECT 'THERE ARE AT LEAST TWO ROWS IN THE TABLE'
FROM DUAL
WHERE 2 =
(
    SELECT COUNT(*)
    FROM TABLE
    WHERE ROWNUM < 3
)

It will take less code than doing the manual cursor method, and it is likely to be faster.

The rownum trick means to stop fetching rows once it has two of them.

If you don't put some sort of limit on the count(*), it could take a long while to finish, depending on the number of rows you have. In that case, using a cursor loop, to read 2 rows from the table manually, would be faster.

EvilTeach
A: 

SQL Server:

if 2 = (
 select count(*) from (
  select top 2 * from (
   select T = 1 union
   select T = 2 union
   select T = 3 ) t) t)
 print 'At least two'

Also, don't ever use cursors. If you think you really really need them, beat yourself with a shovel until you change your mind. Let relics from an ancient past remain relics from an ancient past.

Justice
Cursors have a place. Just not in this question.
TheSoftwareJedi
@Justice, Oracle PL/SQL cursors aren't slow. You can't directly translate SQL Server knowledge to Oracle knowledge. Have you ever done some benchmarking with Oracle cursors?
tuinstoel
I haven't. That's true, it was an assumption.
Justice
+13  A: 

There are a number of reasons why developers might perform select COUNT(*) from a table in a PL/SQL program:

1) They genuinely need to know how many rows there are in the table.

In this case there is no choice: select COUNT(*) and wait for the result. This will be pretty fast on many tables, but could take a while on a big table.

2) They just need to know whether a row exists or not.

This doesn't warrant counting all the rows in the table. A number of techniques are possible:

a) Explicit cursor method:

DECLARE
   CURSOR c IS SELECT '1' dummy FROM mytable WHERE ...;
   v VARCHAR2(1);
BEGIN
   OPEN c;
   FETCH c INTO v;
   IF c%FOUND THEN
      -- A row exists
      ...
   ELSE
      -- No row exists
      ...
   END IF;
END;

b) SELECT INTO method

DECLARE
   v VARCHAR2(1);
BEGIN
   SELECT '1' INTO v FROM mytable 
   WHERE ... 
   AND ROWNUM=1; -- Stop fetching if 1 found
   -- At least one row exists
EXCEPTION
   WHEN NO_DATA_FOUND THEN
      -- No row exists
END;

c) SELECT COUNT(*) with ROWNUM method

DECLARE
   cnt INTEGER;
BEGIN
   SELECT COUNT(*) INTO cnt FROM mytable 
   WHERE ... 
   AND ROWNUM=1; -- Stop counting if 1 found
   IF cnt = 0 THEN
      -- No row found
   ELSE
      -- Row found
   END IF;
END;

3) They need to know whether more than 1 row exists.

Variations on the techniques for (2) work:

a) Explicit cursor method:

DECLARE
   CURSOR c IS SELECT '1' dummy FROM mytable WHERE ...;
   v VARCHAR2(1);
BEGIN
   OPEN c;
   FETCH c INTO v;
   FETCH c INTO v;
   IF c%FOUND THEN
      -- 2 or more rows exists
      ...
   ELSE
      -- 1 or 0 rows exist
      ...
   END IF;
END;

b) SELECT INTO method

DECLARE
   v VARCHAR2(1);
BEGIN
   SELECT '1' INTO v FROM mytable 
   WHERE ... ;
   -- Exactly 1 row exists
EXCEPTION
   WHEN NO_DATA_FOUND THEN
      -- No row exists
   WHEN TOO_MANY_ROWS THEN
      -- More than 1 row exists
END;

c) SELECT COUNT(*) with ROWNUM method

DECLARE
   cnt INTEGER;
BEGIN
   SELECT COUNT(*) INTO cnt FROM mytable 
   WHERE ... 
   AND ROWNUM=2; -- Stop counting if 2 found
   IF cnt = 0 THEN
      -- No row found
   IF cnt = 1 THEN
      -- 1 row found
   ELSE
      -- More than 1 row found
   END IF;
END;

Which method you use is largely a matter of preference (and some religious zealotry!) Steven Feuerstein has always favoured explicit cursors over implicit (SELECT INTO and cursor FOR loops); Tom Kyte favours implicit cursors (and I agree with him).

The important point is that to select COUNT(*) without restricting the ROWCOUNT is expensive and should therefore only be done when a count is trully needed.

As for your supplementary question about how to re-write this with an explicit cursor:

CREATE OR REPLACE PROCEDURE do_sth ( emp_id_in IN emp.emp_id%TYPE )
IS
v_rows INTEGER;
BEGIN
    ...

    SELECT COUNT(*) INTO v_rows
    FROM emp
    WHERE emp_id = emp_id_in;

    IF v_rows > 0 THEN
        /* do sth */
    END;

    /* more statements */
    ...

END do_sth;

That would be:

CREATE OR REPLACE PROCEDURE do_sth ( emp_id_in IN emp.emp_id%TYPE )
IS
    CURSOR c IS SELECT 1
                FROM emp
                WHERE emp_id = emp_id_in;
    v_dummy INTEGER;
BEGIN
    ...

    OPEN c;    
    FETCH c INTO v_dummy;
    IF c%FOUND > 0 THEN
        /* do sth */
    END;
    CLOSE c;

    /* more statements */
    ...

END do_sth;

But really, in your example it is no better or worse, since you are selecting the primary key and Oracle is clever enough to know that it only needs to fetch once.

Tony Andrews
A: 

As Steven Feuerstein & Bill Pribyl recommends us not to use SELECT COUNT() to check whether records in a table exist or not, could anyone help me edit the code below in order to avoid using SELECT COUNT(*) by using explicit cursor instead? This code is written in the Oracle stored procedure.

I have a table emp(emp_id, emp_name, ...), so to check the provided employee ID corret or not:

CREATE OR REPLACE PROCEDURE do_sth ( emp_id_in IN emp.emp_id%TYPE )
IS
v_rows INTEGER;
BEGIN
    ...

    SELECT COUNT(*) INTO v_rows
    FROM emp
    WHERE emp_id = emp_id_in;

    IF v_rows > 0 THEN
        /* do sth */
    END;

    /* more statements */
    ...

END do_sth;
Sambath
The above code is potentially dangerous. What happens if another session deletes the row(s) that cause a non-zero count after the count but before the /* do sth */? What specific statement will run if v_rows > 0?
RussellH
A: 

If you want to get number of rows in a table, please don't used count(*), I would suggest count(0) that 0 is the column index of your primary key column.

Samnang
count(*) is fine, count(0) or count(1) isn't faster.
tuinstoel
A: 

Before you take Steven Feuerstein's suggestions too serious, just do a little benchmark. Is count(*) noticeably slower than the explicit cursor in your case? No? Then better use the construct that allows for simple, readable code. Which, in most cases, would be "select count(*) into v_cnt ... if v_cnt>0 then ..."

PL/SQL allows for very readable programs. Don't waste that just to nano-optimize.

ammoQ
A: 

"Before you take Steven Feuerstein's suggestions too serious, just do a little benchmark... PL/SQL allows for very readable programs. Don't waste that just to nano-optimize."

I would have to disagree with you. The time spent proving why I should not use fetch over count(*), could have easily been spent copy/paste from a standard model and replacing the SQL and variables as needed. DONE.

If you can answer the question with a NO, "Do I need to know the full true count?", then skip all the extra effort and go right with building a boolean function that returns TRUE or FALSE when you meet the conditions. Now your core logic is easier to read...

e.g. CASE WHEN MULTIPLE_MATCHES_FOUND_f THEN ELSE END;

Once you have your basic model built, you can easily use that same strategy whenever needed. One needs to be careful about SQL that appears to run fast. Especially when executed multiple times. it will bite you.

Barry Chase