views:

1619

answers:

3

I have a query where not all conditions are necessary. Here's an example of what it looks like when all conditions are used:

select num
from (select distinct q.num
       from cqqv q
       where q.bcode = '1234567' --this is variable
             and q.lb = 'AXCT' --this is variable
             and q.type = 'privt' --this is variable
             and q.edate > sysdate - 30 --this is variable
       order by dbms_random.value()) subq
where rownum <= 10; --this is variable

The parts marked as --this is variable are the parts that, well, vary! If a condition is NOT specified, then there is no default value. For example, if the input specifies "*" for q.type (but leaves everything else the same), then the query should match everything for type, and execute as:

select num
from (select distinct q.num
       from cqqv q
       where q.bcode = '1234567' --this is variable
             and q.lb = 'AXCT' --this is variable
             --and q.type = 'privt' --this condition ignored because of "type=*" in input
             and q.edate > sysdate - 30 --this is variable
       order by dbms_random.value()) subq
where rownum <= 10; --this is variable

I know it is possible to use dynamic sql to build this query on the fly, but I am wondering what sort of performance problems this could cause, and if there is a better way to do this.

+3  A: 

While you could do this...

select num
from (select distinct q.num
       from cqqv q
       where 1=1
             and (:bcode is null or q.bcode = :bcode)
             and (:lb is null or q.lb = :lb)
             and (:type is null or q.type = :type)
             and (:edate is null or q.edate > :edate - 30)
       order by dbms_random.value()) subq
where rownum <= :numrows

... the performance using dynamic SQL will usually be better, as it will generate a more targeted query plan. In the above query, Oracle cannot tell whether to use an index on bcode or lb or type or edate, and will probably perform a full table scan every time.

Of course, you must use bind variables in your dynamic query, not concatenate the literal values into the string, otherwise performance (and scalability, and security) will be very bad.

To be clear, the dynamic version I have in mind would work like this:

declare
    rc sys_refcursor;
    q long;
begin
    q := 'select num
    from (select distinct q.num
           from cqqv q
           where 1=1';

    if p_bcode is not null then
        q := q || 'and q.bcode = :bcode';
    else
        q := q || 'and :bcode is null';
    end if;

    if p_lb is not null then
        q := q || 'and q.lb = :lb';
    else
        q := q || 'and :lb is null';
    end if;

    if p_type is not null then
        q := q || 'and q.type = :type';
    else
        q := q || 'and :type is null';
    end if;

    if p_edate is not null then
        q := q || 'and q.edate = :edate';
    else
        q := q || 'and :edate is null';
    end if;

    q := q || ' order by dbms_random.value()) subq
    where rownum <= :numrows';

    open rc for q using p_bcode, p_lb, p_type, p_edate, p_numrows;
    return rc;
end;

This means that the result query will be "sargable" (a new word to me I must admit!) since the resulting query run will be (for example):

select num
from (select distinct q.num
       from cqqv q
       where 1=1
             and q.bcode = :bcode
             and q.lb = :lb
             and :type is null
             and :edate is null
       order by dbms_random.value()) subq
where rownum <= :numrows

However, I accept that this could require up to 16 hard parses in this example. The "and :bv is null" clauses are required when using native dynamic SQL, but could be avoided by using DBMS_SQL.

Tony Andrews
I usually do it this way, unless performance issues force me to do it in another way. In general, I think this should be the prefered way to go, because always using dynamic SQL means a lot of hard parses, cluttering the shared pool and possible other problems. And also: Makes the program longer and harder to read. Program for humans, unless told otherwise.
ammoQ
Interesting point, I looked up how to use bind variables in dynamic SQL (I didn't know about that trick!). I'll have to give that a try. Thanks!
FrustratedWithFormsDesigner
Of course wherever I said "is null" above I should have said "= '*'", but you get the idea...
Tony Andrews
@ammoQ, it's true there will be more hard parses with dynamic SQL, but only a smallish number - up to one for each variant of the SQL, i.e. 16 different queries in this example. Weighed against that is the cost of performing a full table scan every time versus being able to use indexes with the dynamic queries. If the table is small and the query is simple, the static query may be best. For larger data sets and more complex queries, I'd go for the dynamic version.
Tony Andrews
@ammoQ: Program for performance, and strive to make it readable.
OMG Ponies
@OMG Ponies: I've expanded on my answer to show why it isn't as bad as you think, i.e. the query will be "sargable".
Tony Andrews
Actually, I implemented it sort of like this, but instead of having "and :bcode is null" for the cases where the input is "*", I just omit any reference to bcode because in those cases, any value for bcode is acceptable.
FrustratedWithFormsDesigner
The `q` datatype is also incorrect - you've defined a long when you're concatenating strings...
OMG Ponies
Exactly. If I don't bother with the check for null IN the query and only include the condition if some other requirements (whatever they are) are met, does it really matter if I use bind variables or context variables?
FrustratedWithFormsDesigner
I see why you have the `ELSE`, it's because you have to have a bind variable in the query in order to use the `USING` clause or you'll get an ORA error. Exactly why context is the better approach.
OMG Ponies
@Frustrated: If you use bind variables, you **have** to always include the bind variable reference - as you see in Tony's update. That's why he has to have `AND :edate IS NULL` if the value is null - you'll get a ORA error otherwise.
OMG Ponies
Also - the number of hard parses is equal to the number of optional criteria squared.
OMG Ponies
There is a way to improve this solution: Don't use just <code>AND :edate IS NULL </code>. Let the optimizer eliminate it from the query at the beginning of parse/analyze phase. This can be done by <code>AND (1=1 or :edate IS NULL)</code>. This way you have the benefit of OMG Ponies' context variables' (having in where clause only "meaningful" conditions) without their complexity and the necessity of creating a context. In some cases, for long running queries, it would be more performant to eliminate binding at all. But it's rare and you are then vulnerable to SQL injection if u are not careful
Michal Pravda
@OMG Ponies: "The q datatype is also incorrect" - no, in PL/SQL "long" is a synonym for varchar2(32767). Confusing I know.
Tony Andrews
"Also - the number of hard parses is equal to the number of optional criteria squared" - yes, I admitted that (I said it would be up to 16 in the current example). But they will be 16 different queries, for which potentially 16 different query plans will be optimal. It is also the same for your solution with the context variables.
Tony Andrews
@Michal: that's a neat idea.
Tony Andrews
@Michal: Our DBA suggested something similar, here's the link he gave me: http://www.oracle.com/technology/oramag/oracle/09-jul/o49asktom.html
FrustratedWithFormsDesigner
+2  A: 

While I agree with Tony that performance of using dynamic SQL is better, context variables is a better approach than using bind variables.

Using IN_VARIABLE IS NULL OR table.fieldx = IN_VARIABLE is not ideal for handling optional values. Each time a query is submitted, Oracle first checks in its shared pool to see if the statement has been submitted before. If it has, the execution plan for the query is retrieved and the SQL is executed. If the statement can not be found in the shared pool, Oracle has to go through the process of parsing the statement, working out various execution paths and coming up with the optimal access plan (AKA “best path”) before it can be executed. This process is known as a “hard parse”, and can take longer than the query itself. Read more about the hard/soft parse in Oracle here, and AskTom here.

In short - this:

and (:bcode is null or q.bcode = :bcode)

...will execute the same, dynamic or otherwise. There's no benefit to using bind variables in dynamic SQL for optional parameters. The setup still destroys SARGability...

Context parameters are a feature that was introduced in Oracle 9i. They are tied to a package, and can be used to set attribute values (only for users with EXECUTE permission on the package, and you'll have to grant CREATE CONTEXT to the schema). Context variables can be used to tailor dynamic SQL so it includes only what is necessary for the query based on the filter/search criteria. In comparison, Bind variables (also supported in dynamic SQL) require that a value is specified which can result in IN_VARIABLE IS NULL OR table.fieldx = IN_VARIABLE tests in the search query. In practice, a separate context variable should be used for each procedure or function to eliminate the risk of value contamination.

Here's your query using context variables:

L_CURSOR SYS_REFCURSOR;
L_QUERY  VARCHAR2(5000) DEFAULT 'SELECT num
                                   FROM (SELECT DISTINCT q.num
                                           FROM CQQV q
                                          WHERE 1 = 1 ';
BEGIN

    IF IN_BCODE IS NOT NULL THEN
      DBMS_SESSION.SET_CONTEXT('THE_CTX',
                               'BCODE',
                               IN_BCODE);
      L_QUERY := L_QUERY || ' AND q.bcode = SYS_CONTEXT(''THE_CTX'', ''BCODE'') ';
    END IF;

    IF IN_LB IS NOT NULL THEN
      DBMS_SESSION.SET_CONTEXT('THE_CTX',
                               'LB',
                               IN_LB);
      L_QUERY := L_QUERY || ' AND q.lb = SYS_CONTEXT(''THE_CTX'', ''LB'') ';
    END IF;

    IF IN_TYPE IS NOT NULL THEN
      DBMS_SESSION.SET_CONTEXT('THE_CTX',
                               'TYPE',
                               IN_TYPE);
      L_QUERY := L_QUERY || ' AND q.type = SYS_CONTEXT(''THE_CTX'', ''TYPE'') ';
    END IF;

    IF IN_EDATE IS NOT NULL THEN
      DBMS_SESSION.SET_CONTEXT('THE_CTX',
                               'EDATE',
                               IN_EDATE);
      L_QUERY := L_QUERY || ' AND q.edate = SYS_CONTEXT(''THE_CTX'', ''EDATE'') - 30 ';
    END IF;

    L_QUERY := L_QUERY || ' ORDER BY dbms_random.value()) subq
           WHERE rownum <= :numrows ';

    FOR I IN 0 .. (TRUNC(LENGTH(L_QUERY) / 255)) LOOP
      DBMS_OUTPUT.PUT_LINE(SUBSTR(L_QUERY, I * 255 + 1, 255));
    END LOOP;

    OPEN L_CURSOR FOR L_QUERY USING IN_ROWNUM;
    RETURN L_CURSOR;

END;

The example still uses a bind variable for the rownum, because the value is not optional.

DBMS_SESSION.SET_CONTEXT('THE_CTX', 'LB', IN_LB);

The SET_CONTEXT parameters are as follows:

  1. The context variable name. There's no instance creation involved
  2. The variable within the context variable. A context variable is like a session variable, assuming familiarity with web applications & session objects.
  3. The value for the variable defined in parameter #2.

Bind vs Context

Bind variables means Oracle expects a variable reference to populate - it's an ORA error otherwise. For example:

... L_QUERY USING IN_EXAMPLE_VALUE

...expects that there is a single bind variable reference to be populated. If IN_EXAMPLE_VALUE is null, there has to be :variable in the query. IE: AND :variable IS NULL

Using a context variable means not having to include the extraneous/redundant logic, checking if a value is null.

IMPORTANT: Bind variables are processed in order of occurrence (known as ordinal), NOT by name. You'll notice there's no datatype declaration in the USING clause. Ordinals aren't ideal - if you change them in the query without updating the USING clause, it will break the query until it's fixed.

OMG Ponies
I had not yet heard about context variables, and I think they may be useful for some other stuff I'm working on (thanks for brining them to my attention!), but I'm still unclear as to why they're better than bind variables in this case. As a simple example, what would the difference be if my query were:select *from cqqv q where q.bcode = sys_context('ctx1','bcode')versusselect *from cqqv q where q.bcode = :bcodeWhat advantage does the context variable method have over the bind variable?
FrustratedWithFormsDesigner
The difference is that when you use bind variables, you **always** have to have the bind variable reference in your query. IE: if the value is null, you **have** to have `AND :bcode IS NULL`. Using a context variable means not having to include extraneous/redundant `WHERE` clauses
OMG Ponies
AHA! I see it now! The code to open the cursor is dependent on how many bind variables are in the cursor, but it won't matter if context variables are used! :)
FrustratedWithFormsDesigner
Hmm now I need to talk to the DBA to give me permission to execute dbms_session.set_context...
FrustratedWithFormsDesigner
OMG Ponies
@OMG I used to use this approach. But moved to simpler one, see my comment to Tony's answer
Michal Pravda
I spoke to our DBA yesterday evening. He didn't seem too keen on the idea of using context variables to solve this problem, and found a link to a possible solution. It looks a bit like Michal's.http://www.oracle.com/technology/oramag/oracle/09-jul/o49asktom.html
FrustratedWithFormsDesigner
A: 

The solution I've settled on is one that generates an dynamic SQL query that may look like this:

select num
from (select distinct q.NUM
       from cqqv q 
       where  (q.bcode = :bcode) 
                  and  (1=1 or :lb is null) 
                  and  (1=1 or :type is null) 
                  and  (q.edate> :edate) 
                order by dbms_random.value()) subq 
where rownum <= :numrows

(in this example, the bcode and edate conditions were NOT optional, but the lb and type were)

I think this is (or is very similar to) what Michal Pravda was suggesting, and our DBA here prefers this solution over the context variable solution. Thanks for all that helped and offered advice!

A link our DBA found which details this solution is here:

Ask Tom: On Popularity and Natural Selection

FrustratedWithFormsDesigner